ABSTRACT
We examine our proposed word separator for Thai script called two-level tokenization (2LT) by applying this tokenizer to medical Thai script including chief complaints, ICD-10 descriptions. We verify the results of tokenization through the machine learning-based classification. The experimental result shows that the proposed tokenizer works well for Classification and Regression Trees (CART) method with an 85% of precision and 71% of recall. While the F1 score is also 76%. However these values are not high enough to make the proposed tokenizer worthwhile. This paper presents how to improve the results of Thai sign and symptom classification. To increase the precision, recall, and F1 score we adapt context-free grammar (CFG) concept to eliminate the unnecessary some conjunction words which are a common word from the consideration of experimental results. Consequently the precision, recall, and F1 score change from 85%, 71%, and 76% to 93%, 86%, and 89% respectively, this shows that applying CFG can be exploited to yield a higher accuracy than the previous experimental results without applying the CFG concept.
- Y. Poowarawan, "Dictionary-based thai syllable separation," in Proc. Ninth Electronics Engineering Conference (EECON-86), Bangkok, 1986, pp. 409--418.Google Scholar
- V. Sornlertlamvanich, "Word segmentation for Thai in machine translation system," Machine Translation, 1993.Google Scholar
- S. Meknavin, P. Charoenpornsawat and B. Kijsirikul, "Feature-based Thai word segmentation," in proceedings of the natural language processing Pacific Rim Symposium 1997, Bangkok, National Electronics and Computer Technology Center, 1997, pp. 41--47.Google Scholar
- K. Kosawat, M. Boriboon, P. Chootrakool, A. Chotimongkol, S. Klaithin, S. Kongyoung, K. Kriengket, S. Phaholphinyo, S. Purodakananda, T. Thanakulwarapas and C. Wutiwiwatchai, "BEST 2009: Thai word segmentation software contest," in 2009 Eighth International Symposium on Natural Language Processing, Bangkok, 2009.Google Scholar
- NECTEC, "LexTo Thai Lexeme Tokenizer," 2016. {Online}. Available: http://www.sansarn.com/lexto/. {Accessed 10 October 2017}.Google Scholar
- Mosby, "Mosby's Medical Dictionary 9th Edition," Elsevier, Amsterdam, 2012.Google Scholar
- WHO, "International Statistical Classification of Diseases and Related Health Problems 10th Revision," 2016. {Online}. Available:http://apps.who.int/classifications/icd10/browse/2016/en. {Accessed 15 August 2017}.Google Scholar
- Thai-Health-Coding-Center, "ICD-10-TM Online," 2016. {Online}. {Accessed 17 September 2017}.Google Scholar
- P. Saeku and J. Duangsuwan, "Signs and Symptoms Tagging for Thai Chief Complaints Based on ICD-10," in ICACS '17 2017 International Conference on Algorithms, Computing and Systems, Jeju Island, Republic of Korea, 2017. Google ScholarDigital Library
- ThaiNurseClub, "Patient Interviewing & History Taking," 2013. {Online}. Available: http://thainurseclub.blogspot.com. {Accessed 12 May 2017}. (in Thai script)Google Scholar
- ศราวธ อยเกษม, "Chief Complaint," 2011. {Online}. Available: https://www.gotoknow.org/posts/402169. {Accessed 12 May 2017}. (in Thai script)Google Scholar
- scikit-learn, "Machine learning in Python," 2010. {Online}. Available: http://scikit-learn.org/stable/. {Accessed 1 June 2017}.Google Scholar
- J. Duangsuwan and P. Saeku, "Semi-automatic classification based on ICD code for Thai text-based chief complaint by machine learning techniques," International Journal of Future Computer and Communication, 2018. (in press)Google Scholar
Index Terms
- Improving Accuracy in Thai Sign and Symptom Classification using Context-Free Grammar Approach
Recommendations
Signs and Symptoms Tagging for Thai Chief Complaints Based on ICD-10
ICACS '17: Proceedings of the 1st International Conference on Algorithms, Computing and SystemsThis paper presents a natural language processing (NLP) approach to construct signs and symptoms corpus in order to identify signs and symptoms recoded in a Thai chief complains (CCs) based on the International Statistical Classification of Diseases and ...
Enhancing aspects of Thai chief complaint classification Performance
ICCAE 2020: Proceedings of the 2020 12th International Conference on Computer and Automation EngineeringIn this paper, we describe the aspects affecting in our experimental results of classifying Thai chief complaint (ThCC) into ICD-10 code. By merging our proposed Thai word separator to machine learning-based classifiers, ThCC have been converted into ...
Parsing Arabic using induced probabilistic context free grammar
The importance of the parsing task for NLP applications is well understood. However developing parsers remains difficult because of the complexity of the Arabic language. Most parsers are based on syntactic grammars that describe the syntactic ...
Comments