Correcting and Standardizing Crude Drug Names in Traditional Medicine Formulae by Ensemble of String Matching Techniques

Pakdeesattayapong, Duangkamol; Lertnattee, Verayuth

doi:10.1007/978-3-319-22186-1_24

Duangkamol Pakdeesattayapong¹⁶ &
Verayuth Lertnattee¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9226))

Included in the following conference series:

International Conference on Intelligent Computing

1501 Accesses
2 Citations
1 Altmetric

Abstract

Common problems of representing crude drug names in traditional herbal formulae are spelling errors, grammatical variants, synonyms and various formats. In order to make these names more obvious and useful, correcting and standardizing of these names should be applied. In this work, crude drug names in various forms were corrected and standardized by string matching techniques. A set of experiments were done using crude drug names from a database of registered traditional medicines in Thai Food and Drug Administration as the test set. Two well-known algorithms, i.e., similar text and Levenshtein were investigated. However, the results from each algorithm indicated that crude drug names in the test set were moderately matched with those of the standard set. To increase performance of these single algorithms, the ensemble algorithm was proposed. From the results, the ensemble algorithm outperforms single algorithms to match crude drug names, especially crude drug names with the modifier that have no significant meaning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Food and Drug Administration Thailand. http://www.fda.moph.go.th/fda_eng/frontend/theme_1/info_data_main.php?ID_Info_Main=4
Navarro, G.: A guide tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)
Article Google Scholar
Bureau of Drug and Narcotic, Department of Medical Sciences: Thai Herbal Pharmacopoeia, vol. 3. Office of National Buddishm Press, Bangkok (2009)
Google Scholar
Ministry of Health of the People’s Republic of China: Pharmacopoeia of the People’s Republic of China. China Medical Science Press, Beijing (2010)
Google Scholar
World Health Organization: WHO Monograph on Selected Medicinal Plants, vol. 4. WHO Press, Geneva (2005)
Google Scholar
Klaus, U.S., Stoyan, M.: Fast string correction with Levenshtein automata. Int. J. Doc. Anal. Recogn. 5, 67–85 (2002)
Article Google Scholar
Wang, J.F., Li, Z.R., Cai, C.Z., Chen, Y.Z.: Assessment of approximate string matching in a biomedical text retrieval problem. Comput. Biol. Med. 29, 717–724 (2005)
Article Google Scholar
Tilo, B., Leonid, V.B.: Levenshtein error-correcting barcodes for multiplexed DNA sequencing. BMC Bioinform. 14, 272–281 (2013)
Article Google Scholar
Rees, T.: Fuzzy matching of taxon names for biodiversity informatics applications. Poster session presented at the meeting of e-Biosphere Conference, UK (2009)
Google Scholar
Brad, B., et al.: The taxonomic name resolution service: an online tool for automated standardization of plant names. Bioinformatics 14(16), 1–14 (2013)
Google Scholar
Grzegorz, K., Bonnie, D.: Automatic identification of confusable drug names. Artif. Intell. Med. 36, 29–42 (2006)
Article Google Scholar
PHP. http://php.net/manual/en/function.similar-text.php
Oliver, I.: Programming Classics: Implementing the World’s Best Algorithms. Prentice Hall Inc., Englewood Cliffs (1993)
Google Scholar
Ilse, D., Nathalie, D.S., Arda, T.: Post-editing of machine translation: a case study. In: Laura, W.B., Michael, C. (eds.) Processes and Applications, pp. 78–108. Cambridge Scholar publishing, Newcastle (2014)
Google Scholar
Levenshtein, V.I.: Binary code capable of correcting deletions, insertions, and reverals. Sov. Phys. Dokl. 10(8), 707–710 (1966)
MathSciNet Google Scholar
Andres, M., Enrique, V.: Computation of normalized edit distance and applications. IEEE Trans. Pattern Anal. Mach. Intell. 15(9), 1091–1095 (1993)
Google Scholar
Peter, C.: A comparison of personal name matching. In: Sixth IEEE International Conference on Data Mining Workshop, pp. 290–294. The Printing House Publication, USA (2006)
Google Scholar
Lisa, T., Beata, M., Aron, H., Martin, D., Maria, K.: EACL - expansion of abbreviations in CLinical text. In: The 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations, pp. 2085–2090. Association for Computational Linguistics (ACL), Pennsylvania (2014)
Google Scholar
Bryan, R., Sanda, H., Kirk, R.: Automatic extraction of relations between medicals concepts in clinical texts. J. Am. Med. Inform. Assoc. 18, 594–600 (2011)
Article Google Scholar
Zied, M., Lina, F.S., Elise, P.-G., Thierry, L., Stefan, J.D.: Spell-checking queries by combining Levenshtein and Stoilos distances. In: Oral presentation session presented at Network Tools and Applications in Biology Clinical Bioinformatics (NETTAB) Workshop, Italy (2011)
Google Scholar
Shaun, J.G., Overhage, J.M., Clement, M.: Real world performance of approximate string comparators for use in patient matching. In: Medinfo 2004 Proceedings of the 11th World Congress on Medical Informatics, pp. 43–47. IOS Press (2004)
Google Scholar
Johnston, E., Kushmerick, N.: Aggregating web services with active invocation and ensembles of string distance metrics. In: Motta, E., Shadbolt, N.R., Stutt, A., Gibbins, N. (eds.) EKAW 2004. LNCS (LNAI), vol. 3257, pp. 386–402. Springer, Heidelberg (2004)
Chapter Google Scholar
Michael, J.P.: SAPLE: Sandia advanced personnel locator engine. In: U.S. Department of Energy (ed.) Sandia Report. U.S. Department of Energy, Springfield (2010)
Google Scholar
Taro, Y.: Elementary Sampling Theory. Prentice Hall Inc., Englewood Cliffs (1967)
MATH Google Scholar
Christopher, D.M., Prabhakar, R., Hinrich, S.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Google Scholar

Download references

Acknowledgments

This work was supported by the Higher Education Research Promotion and National Research University Project of Thailand, Office of the Higher Education Commission and also supported by the Graduate of Silpakorn University as well as the Research and Creative Funding Scheme, Faculty of Pharmacy, Silpakorn University (2014).

Author information

Authors and Affiliations

Faculty of Pharmacy, Silpakorn University, Muang, 73000, Nakhon Pathom, Thailand
Duangkamol Pakdeesattayapong & Verayuth Lertnattee

Authors

Duangkamol Pakdeesattayapong
View author publications
You can also search for this author in PubMed Google Scholar
Verayuth Lertnattee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Verayuth Lertnattee .

Editor information

Editors and Affiliations

Tongji University, Shanghai, China
De-Shuang Huang
University of Ulsan, Ulsan, Korea (Republic of)
Kang-Hyun Jo
Liverpool John Moores University, Liverpool, United Kingdom
Abir Hussain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pakdeesattayapong, D., Lertnattee, V. (2015). Correcting and Standardizing Crude Drug Names in Traditional Medicine Formulae by Ensemble of String Matching Techniques. In: Huang, DS., Jo, KH., Hussain, A. (eds) Intelligent Computing Theories and Methodologies. ICIC 2015. Lecture Notes in Computer Science(), vol 9226. Springer, Cham. https://doi.org/10.1007/978-3-319-22186-1_24

Download citation

DOI: https://doi.org/10.1007/978-3-319-22186-1_24
Published: 11 August 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22185-4
Online ISBN: 978-3-319-22186-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics