skip to main content
10.1145/3396730.3396736acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiceccConference Proceedingsconference-collections
research-article

How Similar is Similar: A Comparison of Bahasa Indonesia and Bahasa Malaysia

Published: 29 May 2020 Publication History

Abstract

There are so many languages in the world. However some languages are similar to each other. In this paper we compare two similar languages, Bahasa Indonesia and Bahasa Malaysia. We build corpora for both Bahasa Indonesia and Bahasa Malaysia and perform comparisons. We propose an approach to measure language similarity that combines similarity in written form and semantics.

References

[1]
Aw, A., Aljunied, S. M., Lee, L. and Li, H. 2009. Piramid: Bahasa Indonesia and Bahasa Malaysia translation system enhanced through comparable corpora. In Proceedings of Workshop on Technologies and Corpora for Asia-Pacific Speech Translation (Suntec, Singapore, August 2, 2009). TCAST 2009.
[2]
Oco, N., Syliongka, L. R., Roxas, R. E., and Ilao, J. 2013. Dice's coefficient on trigram profiles as metric for language similarity. In 2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (Gurgaon, India, November 25-27, 2013). O-COCOSDA/CASLRE 2013. IEEE, 1--4. DOI= https://doi.org/10.1109/ICSDA.2013.6709892.
[3]
Oco, N., Ilao, J., Roxas, R. E., and Syliongka, L. R. 2013. Measuring language similarity using trigrams: Limitations of language identification. In 2013 International Conference on Recent Trends in Information Technology (Chennai, India, July, 25-27, 2013). ICRTIT 2013. IEEE, 479--482. DOI= https://doi.org/10.1109/ICRTIT.2013.6844250.
[4]
Şenel, L. K., Utlu, I., Yücesoy, V., Koç, A., and Çukur, T. 2018. Generating Semantic Similarity Atlas for Natural Languages. In 2018 IEEE Spoken Language Technology Workshop (Athens, Greece, December, 18-21, 2018) SLT 2018. IEEE, 795--799. DOI= https://doi.org/10.1109/SLT.2018.8639521.
[5]
Şenel, L. K., Yücesoy, V., Koç, A., and Çukur, T. 2017. Measuring cross-lingual semantic similarity across European languages. In 2017 40th International Conference on Telecommunications and Signal Processing (Barcelona, Spain, July 5-7, 2017) TSP 2017. IEEE, 359--363. DOI= https://doi.org/10.1109/TSP.2017.8076005.
[6]
Barrón-Cedeno, A., Paramita, M. L., Clough, P. and Rosso, P. 2014. A comparison of approaches for measuring cross-lingual similarity of wikipedia articles. In European Conference on Information Retrieval, LNCS 8416 (Amsterdam, The Netherlands, Apr 13-16, 2014). ECIR 2014, Springer, Cham, 424--429. DOI= https://doi.org/10.1007/978-3-319-06028-6_36
[7]
Yuan, S. and Qian, Z. 2015. Research on cross-language text similarity calculation. In 2015 IEEE 5th International Conference on Electronics Information and Emergency Communication (Beijing, China, 14-16 May 2015).IEEE, 423--426. DOI= https://doi.org/10.1109/ICEIEC.2015.7284573
[8]
Simard, M., Foster, G. F., and Isabelle, P. 1993. Using Cognates to Align Sentences in Bilingual Corpora. In Proceedings of the 1993 Conference of the Centre for Advanced Studies on Collaborative Research: Distributed Computing - Volume 2 (Toronto, Ontario, Canada, October 1993) CASCON '93. IBM Press, 1071--1082
[9]
Lyons, J. 2017. Practical Cryptography, http://practicalcryptography.com/cryptanalysis/letter-frequencies-various-languages/english-letter-frequencies

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICECC '20: Proceedings of the 3rd International Conference on Electronics, Communications and Control Engineering
April 2020
73 pages
ISBN:9781450374996
DOI:10.1145/3396730
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 May 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Bahasa Indonesia
  2. Bahasa Malaysia
  3. language similarity

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICECC 2020

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 62
    Total Downloads
  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media