Abstract
Common problems of representing crude drug names in traditional herbal formulae are spelling errors, grammatical variants, synonyms and various formats. In order to make these names more obvious and useful, correcting and standardizing of these names should be applied. In this work, crude drug names in various forms were corrected and standardized by string matching techniques. A set of experiments were done using crude drug names from a database of registered traditional medicines in Thai Food and Drug Administration as the test set. Two well-known algorithms, i.e., similar text and Levenshtein were investigated. However, the results from each algorithm indicated that crude drug names in the test set were moderately matched with those of the standard set. To increase performance of these single algorithms, the ensemble algorithm was proposed. From the results, the ensemble algorithm outperforms single algorithms to match crude drug names, especially crude drug names with the modifier that have no significant meaning.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Food and Drug Administration Thailand. http://www.fda.moph.go.th/fda_eng/frontend/theme_1/info_data_main.php?ID_Info_Main=4
Navarro, G.: A guide tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)
Bureau of Drug and Narcotic, Department of Medical Sciences: Thai Herbal Pharmacopoeia, vol. 3. Office of National Buddishm Press, Bangkok (2009)
Ministry of Health of the People’s Republic of China: Pharmacopoeia of the People’s Republic of China. China Medical Science Press, Beijing (2010)
World Health Organization: WHO Monograph on Selected Medicinal Plants, vol. 4. WHO Press, Geneva (2005)
Klaus, U.S., Stoyan, M.: Fast string correction with Levenshtein automata. Int. J. Doc. Anal. Recogn. 5, 67–85 (2002)
Wang, J.F., Li, Z.R., Cai, C.Z., Chen, Y.Z.: Assessment of approximate string matching in a biomedical text retrieval problem. Comput. Biol. Med. 29, 717–724 (2005)
Tilo, B., Leonid, V.B.: Levenshtein error-correcting barcodes for multiplexed DNA sequencing. BMC Bioinform. 14, 272–281 (2013)
Rees, T.: Fuzzy matching of taxon names for biodiversity informatics applications. Poster session presented at the meeting of e-Biosphere Conference, UK (2009)
Brad, B., et al.: The taxonomic name resolution service: an online tool for automated standardization of plant names. Bioinformatics 14(16), 1–14 (2013)
Grzegorz, K., Bonnie, D.: Automatic identification of confusable drug names. Artif. Intell. Med. 36, 29–42 (2006)
Oliver, I.: Programming Classics: Implementing the World’s Best Algorithms. Prentice Hall Inc., Englewood Cliffs (1993)
Ilse, D., Nathalie, D.S., Arda, T.: Post-editing of machine translation: a case study. In: Laura, W.B., Michael, C. (eds.) Processes and Applications, pp. 78–108. Cambridge Scholar publishing, Newcastle (2014)
Levenshtein, V.I.: Binary code capable of correcting deletions, insertions, and reverals. Sov. Phys. Dokl. 10(8), 707–710 (1966)
Andres, M., Enrique, V.: Computation of normalized edit distance and applications. IEEE Trans. Pattern Anal. Mach. Intell. 15(9), 1091–1095 (1993)
Peter, C.: A comparison of personal name matching. In: Sixth IEEE International Conference on Data Mining Workshop, pp. 290–294. The Printing House Publication, USA (2006)
Lisa, T., Beata, M., Aron, H., Martin, D., Maria, K.: EACL - expansion of abbreviations in CLinical text. In: The 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations, pp. 2085–2090. Association for Computational Linguistics (ACL), Pennsylvania (2014)
Bryan, R., Sanda, H., Kirk, R.: Automatic extraction of relations between medicals concepts in clinical texts. J. Am. Med. Inform. Assoc. 18, 594–600 (2011)
Zied, M., Lina, F.S., Elise, P.-G., Thierry, L., Stefan, J.D.: Spell-checking queries by combining Levenshtein and Stoilos distances. In: Oral presentation session presented at Network Tools and Applications in Biology Clinical Bioinformatics (NETTAB) Workshop, Italy (2011)
Shaun, J.G., Overhage, J.M., Clement, M.: Real world performance of approximate string comparators for use in patient matching. In: Medinfo 2004 Proceedings of the 11th World Congress on Medical Informatics, pp. 43–47. IOS Press (2004)
Johnston, E., Kushmerick, N.: Aggregating web services with active invocation and ensembles of string distance metrics. In: Motta, E., Shadbolt, N.R., Stutt, A., Gibbins, N. (eds.) EKAW 2004. LNCS (LNAI), vol. 3257, pp. 386–402. Springer, Heidelberg (2004)
Michael, J.P.: SAPLE: Sandia advanced personnel locator engine. In: U.S. Department of Energy (ed.) Sandia Report. U.S. Department of Energy, Springfield (2010)
Taro, Y.: Elementary Sampling Theory. Prentice Hall Inc., Englewood Cliffs (1967)
Christopher, D.M., Prabhakar, R., Hinrich, S.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Acknowledgments
This work was supported by the Higher Education Research Promotion and National Research University Project of Thailand, Office of the Higher Education Commission and also supported by the Graduate of Silpakorn University as well as the Research and Creative Funding Scheme, Faculty of Pharmacy, Silpakorn University (2014).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Pakdeesattayapong, D., Lertnattee, V. (2015). Correcting and Standardizing Crude Drug Names in Traditional Medicine Formulae by Ensemble of String Matching Techniques. In: Huang, DS., Jo, KH., Hussain, A. (eds) Intelligent Computing Theories and Methodologies. ICIC 2015. Lecture Notes in Computer Science(), vol 9226. Springer, Cham. https://doi.org/10.1007/978-3-319-22186-1_24
Download citation
DOI: https://doi.org/10.1007/978-3-319-22186-1_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22185-4
Online ISBN: 978-3-319-22186-1
eBook Packages: Computer ScienceComputer Science (R0)