ABSTRACT
Comments in source code are a form of inline documentation created by programmers to help others understand the function of the program. The students of the basic programming subject need how to learn to write better code comments which can be difficulties for the lecturer assessing. Therefore, the author proposes an automatic source code comment assessment method for the online judge system with a corpus-based text similarity approach. Word2vec, GloVe, and fastText models will be used to train word vectors with the Indonesian Wikipedia Dump. The Similarities will be measured using Word Mover's Distance (WMD). Experiments were carried out using epoch variations during the training process. Spearman's rho correlation coefficient, mean average error (MAE), and performance measurements of each model will be compared. The methods with the proposed word embedding approach still provide not good results.
- P. J. De Pasquale, M. E. Locasto, L. Kaczmarczyk, M. Martinovic. 2012. "//TODO: Help students improve commenting practices. In 2012 Frontiers in Education Conference Proceedings, 1-6. DOI: 10.1109/FIE.2012.6462504.Google ScholarDigital Library
- Bai Yang, Zhang Liping, Zhao Fengrong. 2019. A Survey on Research of Code Comment. In ICMSS 2019: Proceedings of the 2019 3rd International Conference on Management Engineering, Software Engineering and Service Sciences, 45–51. DOI: https://doi.org/10.1145/3312662.3312710.Google ScholarDigital Library
- D. Steidl, B. Hummel, dan E. Juergens. 2013. Quality analysis of source code comments. In IEEE International Conference on Program Comprehension, 83–92. DOI: 10.1109/ICPC.2013.6613836.Google ScholarCross Ref
- Yuan Huang, Nan Jia, Qiang Zhou, Xiangping Chen, Yingfei Xiong, Xiaonan Luo. 2018. Guiding developers to make informative commenting decisions in source code. In Proceedings of the 40th International Conference on Software Engineering, 260 -261. DOI: https://doi.org/10.1145/3183440.3194960.Google ScholarDigital Library
- Peter J. De Pasquale, Michael E. Locasto, Lisa C. Kaczmarczyk. 2012. Identifying effective pedagogical practices for commenting computer source code. In Proceedings of the 43rd ACM technical symposium on Computer Science Education, 678. DOI: https://doi.org/10.1145/2157136.2157420.Google ScholarDigital Library
- W. H. Gomaa dan A. A. Fahmy. 2013. A Survey of Text Similarity Approaches. In International Journal Computing Application, Vol. 68. DOI: 10.5120/11638-7118.Google Scholar
- W. H. Gomaa dan A. A. Fahmy. 2020. Ans2vec: A Scoring System for Short Answers. In The Internasional Conference on Advanced Machine Learning Technologies and Applications, 586–595. DOI: 10.1007/978-3-030-14118-9_59.Google Scholar
- C. Jin, B. He, dan J. Xu. 2017. A study of distributed semantic representations for automated essay scoring. In KSEM 2017: Knowledge Science, Engineering and Management, Vol. 10412. DOI: https://doi.org/10.1007/978-3-319-63558-3_2.Google ScholarCross Ref
- Tsegaye Misikir Tashu, Tomas Horvath. Pair-Wise: Automatic Essay Evaluation using Word Mover's Distance. 2018. In Proceedings of the 10th International Conference on Computer Supported Education, 59-66. ISBN: 978-989-758-291-2.Google Scholar
- Rosa Ariani Sukamto, Rani Megasari, Erna Piantari, M Nabillah Fihira Rischa. 2020. Code Comment Assessment Development for Basic Programming Subject using Online Judge. In Proceedings of the 7th Mathematics, Science, and Computer Science Education International Seminar. DOI: 10.4108/eai.12-10-2019.2296547.Google ScholarCross Ref
- E. B. Setiawan, D. H. Widyantoro, dan K. Surendro. 2016. Feature expansion using word embedding for tweet topic classification. In Proceeding 2016 10th International Conference on Telecommunication System, Services and Application. DOI: 10.1109/TSSA.2016.7871085.Google ScholarCross Ref
- S. Arora, Y. Liang, dan T. Ma. 2016. Simple but Tough-to-Beat Baseline for Sentence Embeddings. In International Conference on Learning Representations, 416–424.Google Scholar
- T. Mikolov, K. Chen, G. Corrado, dan J. Dean. 2013. Efficient Estimation of Word Representations in Vector Space. In Proceedings of Workshop at ICLR. http://arxiv.org/abs/1301.3781.Google Scholar
- J. Pennington, R. Socher, dan C. Manning. 2014. Glove: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 1532–1543. DOI: 10.3115/v1/D14-1162.Google ScholarCross Ref
- P. Bojanowski, E. Grave, A. Joulin, dan T. Mikolov. 2017. Enriching Word Vectors with Subword Information. In Transactions of the Association for Computational Linguistics, 135–146. http://arxiv.org/abs/1607.04606.Google Scholar
- M. J. Kusner, Y. Sun, N. I. Kolkin, dan K. Q. Weinberger. 2015. From word embeddings to document distances. In 32nd International Conference on Machine Learning, 957–966.Google Scholar
Recommendations
An Efficient Approach for Findings Document Similarity Using Optimized Word Mover’s Distance
Pattern Recognition and Machine IntelligenceAbstractWe introduce Optimized Word Mover’s Distance (OWMD), a similarity function that compares two sentences based on their word embeddings. The method determines the degree of semantic similarity between two sentences considering their interdependent ...
Using Word Mover’s Distance with Spatial Constraints for Measuring Similarity Between Mongolian Word Images
Neural Information ProcessingAbstractIn the framework of bag-of-visual-words, visual words are independent each other, which results in discarding spatial relations and lacking semantic information of visual words. To capture semantic information of visual words, a deep learning ...
Improving Vietnamese WordNet using word embedding
NLPIR '19: Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information RetrievalThis paper presents a simple but effective method to improve the quality of WordNet synsets and extract glosses for synsets. We translate the Princeton WordNet and other intermediate WordNets to a target language using a machine translator, then the ...
Comments