Skip to main content

Correcting and Standardizing Crude Drug Names in Traditional Medicine Formulae by Ensemble of String Matching Techniques

  • Conference paper
  • First Online:
Intelligent Computing Theories and Methodologies (ICIC 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9226))

Included in the following conference series:

Abstract

Common problems of representing crude drug names in traditional herbal formulae are spelling errors, grammatical variants, synonyms and various formats. In order to make these names more obvious and useful, correcting and standardizing of these names should be applied. In this work, crude drug names in various forms were corrected and standardized by string matching techniques. A set of experiments were done using crude drug names from a database of registered traditional medicines in Thai Food and Drug Administration as the test set. Two well-known algorithms, i.e., similar text and Levenshtein were investigated. However, the results from each algorithm indicated that crude drug names in the test set were moderately matched with those of the standard set. To increase performance of these single algorithms, the ensemble algorithm was proposed. From the results, the ensemble algorithm outperforms single algorithms to match crude drug names, especially crude drug names with the modifier that have no significant meaning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Food and Drug Administration Thailand. http://www.fda.moph.go.th/fda_eng/frontend/theme_1/info_data_main.php?ID_Info_Main=4

  2. Navarro, G.: A guide tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)

    Article  Google Scholar 

  3. Bureau of Drug and Narcotic, Department of Medical Sciences: Thai Herbal Pharmacopoeia, vol. 3. Office of National Buddishm Press, Bangkok (2009)

    Google Scholar 

  4. Ministry of Health of the People’s Republic of China: Pharmacopoeia of the People’s Republic of China. China Medical Science Press, Beijing (2010)

    Google Scholar 

  5. World Health Organization: WHO Monograph on Selected Medicinal Plants, vol. 4. WHO Press, Geneva (2005)

    Google Scholar 

  6. Klaus, U.S., Stoyan, M.: Fast string correction with Levenshtein automata. Int. J. Doc. Anal. Recogn. 5, 67–85 (2002)

    Article  Google Scholar 

  7. Wang, J.F., Li, Z.R., Cai, C.Z., Chen, Y.Z.: Assessment of approximate string matching in a biomedical text retrieval problem. Comput. Biol. Med. 29, 717–724 (2005)

    Article  Google Scholar 

  8. Tilo, B., Leonid, V.B.: Levenshtein error-correcting barcodes for multiplexed DNA sequencing. BMC Bioinform. 14, 272–281 (2013)

    Article  Google Scholar 

  9. Rees, T.: Fuzzy matching of taxon names for biodiversity informatics applications. Poster session presented at the meeting of e-Biosphere Conference, UK (2009)

    Google Scholar 

  10. Brad, B., et al.: The taxonomic name resolution service: an online tool for automated standardization of plant names. Bioinformatics 14(16), 1–14 (2013)

    Google Scholar 

  11. Grzegorz, K., Bonnie, D.: Automatic identification of confusable drug names. Artif. Intell. Med. 36, 29–42 (2006)

    Article  Google Scholar 

  12. PHP. http://php.net/manual/en/function.similar-text.php

  13. Oliver, I.: Programming Classics: Implementing the World’s Best Algorithms. Prentice Hall Inc., Englewood Cliffs (1993)

    Google Scholar 

  14. Ilse, D., Nathalie, D.S., Arda, T.: Post-editing of machine translation: a case study. In: Laura, W.B., Michael, C. (eds.) Processes and Applications, pp. 78–108. Cambridge Scholar publishing, Newcastle (2014)

    Google Scholar 

  15. Levenshtein, V.I.: Binary code capable of correcting deletions, insertions, and reverals. Sov. Phys. Dokl. 10(8), 707–710 (1966)

    MathSciNet  Google Scholar 

  16. Andres, M., Enrique, V.: Computation of normalized edit distance and applications. IEEE Trans. Pattern Anal. Mach. Intell. 15(9), 1091–1095 (1993)

    Google Scholar 

  17. Peter, C.: A comparison of personal name matching. In: Sixth IEEE International Conference on Data Mining Workshop, pp. 290–294. The Printing House Publication, USA (2006)

    Google Scholar 

  18. Lisa, T., Beata, M., Aron, H., Martin, D., Maria, K.: EACL - expansion of abbreviations in CLinical text. In: The 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations, pp. 2085–2090. Association for Computational Linguistics (ACL), Pennsylvania (2014)

    Google Scholar 

  19. Bryan, R., Sanda, H., Kirk, R.: Automatic extraction of relations between medicals concepts in clinical texts. J. Am. Med. Inform. Assoc. 18, 594–600 (2011)

    Article  Google Scholar 

  20. Zied, M., Lina, F.S., Elise, P.-G., Thierry, L., Stefan, J.D.: Spell-checking queries by combining Levenshtein and Stoilos distances. In: Oral presentation session presented at Network Tools and Applications in Biology Clinical Bioinformatics (NETTAB) Workshop, Italy (2011)

    Google Scholar 

  21. Shaun, J.G., Overhage, J.M., Clement, M.: Real world performance of approximate string comparators for use in patient matching. In: Medinfo 2004 Proceedings of the 11th World Congress on Medical Informatics, pp. 43–47. IOS Press (2004)

    Google Scholar 

  22. Johnston, E., Kushmerick, N.: Aggregating web services with active invocation and ensembles of string distance metrics. In: Motta, E., Shadbolt, N.R., Stutt, A., Gibbins, N. (eds.) EKAW 2004. LNCS (LNAI), vol. 3257, pp. 386–402. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  23. Michael, J.P.: SAPLE: Sandia advanced personnel locator engine. In: U.S. Department of Energy (ed.) Sandia Report. U.S. Department of Energy, Springfield (2010)

    Google Scholar 

  24. Taro, Y.: Elementary Sampling Theory. Prentice Hall Inc., Englewood Cliffs (1967)

    MATH  Google Scholar 

  25. Christopher, D.M., Prabhakar, R., Hinrich, S.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)

    Google Scholar 

Download references

Acknowledgments

This work was supported by the Higher Education Research Promotion and National Research University Project of Thailand, Office of the Higher Education Commission and also supported by the Graduate of Silpakorn University as well as the Research and Creative Funding Scheme, Faculty of Pharmacy, Silpakorn University (2014).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Verayuth Lertnattee .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Pakdeesattayapong, D., Lertnattee, V. (2015). Correcting and Standardizing Crude Drug Names in Traditional Medicine Formulae by Ensemble of String Matching Techniques. In: Huang, DS., Jo, KH., Hussain, A. (eds) Intelligent Computing Theories and Methodologies. ICIC 2015. Lecture Notes in Computer Science(), vol 9226. Springer, Cham. https://doi.org/10.1007/978-3-319-22186-1_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-22186-1_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-22185-4

  • Online ISBN: 978-3-319-22186-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics