Skip to main content
Log in

Symptom selection for multi-label data of inquiry diagnosis in traditional Chinese medicine

  • Research Paper
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

In traditional Chinese medicine (TCM) diagnosis, a patient may be associated with more than one syndrome tags, and its computer-aided diagnosis is a typical application in the domain of multi-label learning of high-dimensional data. It is common that a great deal of symptoms can occur in traditional Chinese medical diagnosis, which affects the modeling of diagnostic algorithm. Feature selection entails choosing the smallest feature subset of relevant symptoms, and maximizing the generalization performance of the model. At present there are rare researches on feature selection on multi-label data. A hybrid optimization technique is introduced to symptom selection for multi-label data in TCM diagnosis in this paper, and modeling is made by means of four multi-label learning algorithms like k nearest neighbors, etc. We compare the performance of the algorithm with the current popular dimension reduction algorithms like MEFS (embedded feature selection for multi-Label learning), MDDM (multi-label dimensionality reduction via dependence maximization) on the UCI Yeast gene functional data set and an inquiry diagnosis dataset of coronary heart disease (CHD). Experimental results show that the algorithm we present has significantly improved the performance. In particular, the improvement on the average precision for the classifier is up to 10.62% and 14.54%. Syndrome inquiry modeling of CHD in TCM is realized in this paper, providing effective reference for the diagnosis of CHD and analysis of other multi-label data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Tian L, Yan Y J, Zhu J G. Data mining techniques and their application in TCM study (in Chinese). Chinese J Basic Med Trad Chin Med, 2005, 11: 710–712

    Google Scholar 

  2. Tsousmakas G, Zhang M L, Zhou Z H. Learning from multi-label data. In: Tutorial at ECML/PKDD’09 Bled, Slovenia, 2009

    Google Scholar 

  3. Zhang Y, Zhou Z H. Multi-label dimensionality reduction via dependence maximization. ACM Trans Knowl Discov Data, 2010, 4(3): Article No. 14

    Google Scholar 

  4. Yu K, Yu S P, Tresp V. Multi-label informed latent semantic indexing. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2005. 258–265

    Google Scholar 

  5. Ji S W, Ye J P. Linear dimensionality reduction for multi-label classification. In: Proceedings of the 21st International Conference on Artificial Intelligence, Pasadena, CA, 2009. 1077–1082

    Google Scholar 

  6. Guyon I, Elisseeff A. An introduction to variable and feature selection. J Mach Learn Res, 2003, 3: 1157–1182

    MATH  Google Scholar 

  7. Ge L, Li G Z, You M Y. Embedded feature selection for multi-label learning (in Chinese). J Nanjing Univ (Nat Sci), 2009, 45: 671–676

    Google Scholar 

  8. Moody J, Utans J. Principled architecture selection for neural networks: Application to corporate bond rating prediction. In: Moody J E, Hanson S J, Lippmann P R, eds. Neural Information Processing Systems 4. San Fransisco, CA: Morgan Kaufmann Publishers, Inc, 1992. 683–690

    Google Scholar 

  9. Zhang M L, Pena J M, Robles V, et al. Feature selection for multi-label naive Bayes classification. Inf Sci, 2009, 179: 3218–3229

    Article  MATH  Google Scholar 

  10. Li G C, Li C T, Huang LP, et al. An investigation into regularity of syndrome classification for chronic atrophic gastritis based on structural equation model (in Chinese). J Nanjing Univ Trad Chin Med, 2006, 22: 217–220

    Google Scholar 

  11. Wang X W, Qu H B, Wang J. A quantitative diagnostic method based on data-mining approach in TCM (in Chinese). J Beijing Univ Trad Chin Med, 2005, 28: 4–7

    Google Scholar 

  12. Liu G P, Li G Z, Wang Y L, et al. Modeling of inquiry diagnosis for coronary heart disease in traditional Chinese medicine by using multi-label learning. BMC Complem Altern Med, 2010, 10: 37

    Article  Google Scholar 

  13. Gheyas I A, Smith L S. Feature subset selection in large dimensionality domains. Patt Recogn, 2010, 43: 5–13

    Article  MATH  Google Scholar 

  14. Blickle T, Thiele L. A comparison of selection schemes used in evolutionary algorithms. Evolut Comput, 1996, 4: 361–394

    Article  Google Scholar 

  15. Motoki T. Calculating the expected loss of diversity of selection schemes. Evolut Comput, 2002, 10: 397–422

    Article  Google Scholar 

  16. Sokolov A, Whitley D. Unbiased tournament selection. In: Proceedings of the 2005 Conference on Genetic and Evolutionary Computation. Washington DC: ACM, 2005. 1131–1138

    Chapter  Google Scholar 

  17. Zhang M L, Zhou Z H. ML-KNN: A lazy learning approach to multi-label learning. Patt Recog, 2007, 40: 2038–2048

    Article  MATH  Google Scholar 

  18. Zhang M L, Zhou Z H. Multilabel neural networks with applications to functional genomics and text categorization. IEEE Trans Knowl Data Eng, 2006, 18: 1338–1351

    Article  Google Scholar 

  19. Elisseeff A, Weston J. A kernel method for multi-labelled classification. Adv Neur Inf Process Syst, 2002, 14: 681–687

    Google Scholar 

  20. Ronen M, Jacob Z. Using simulated annealing to optimize feature selection problem in marketing applications. Europ J Oper Res, 2006, 171: 842–858

    Article  MATH  Google Scholar 

  21. Yang J, Honavar V. Feature subset selection using a genetic algorithm. IEEE Intell Syst Appl, 1998, 13: 44–49

    Article  Google Scholar 

  22. Pudil P, Novovicov J, Kittler J, et al. Floating search methods in feature selection. Patt Recog Lett, 1994, 15: 1119–1125

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to GuoZheng Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shao, H., Li, G., Liu, G. et al. Symptom selection for multi-label data of inquiry diagnosis in traditional Chinese medicine. Sci. China Inf. Sci. 56, 1–13 (2013). https://doi.org/10.1007/s11432-011-4406-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11432-011-4406-5

Keywords

Navigation