A new co-training-style random forest for computer aided diagnosis

Deng, Chao; Guo, M. Zu

doi:10.1007/s10844-009-0105-8

A new co-training-style random forest for computer aided diagnosis

Published: 29 October 2009

Volume 36, pages 253–281, (2011)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Chao Deng^1,2 &
M. Zu Guo¹

628 Accesses
38 Citations
Explore all metrics

Abstract

Machine learning techniques used in computer aided diagnosis (CAD) systems learn a hypothesis to help the medical experts make a diagnosis in the future. To learn a well-performed hypothesis, a large amount of expert-diagnosed examples are required, which places a heavy burden on experts. By exploiting large amounts of undiagnosed examples and the power of ensemble learning, the co-training-style random forest (Co-Forest) releases the burden on the experts and produces well-performed hypotheses. However, the Co-forest may suffer from a problem common to other co-training-style algorithms, namely, that the unlabeled examples may instead be wrongly-labeled examples that become accumulated in the training process. This is due to the fact that the limited number of originally-labeled examples usually produces poor component classifiers, which lack diversity and accuracy. In this paper, a new Co-Forest algorithm named Co-Forest with Adaptive Data Editing (ADE-Co-Forest) is proposed. Not only does it exploit a specific data-editing technique in order to identify and discard possibly mislabeled examples throughout the co-labeling iterations, but it also employs an adaptive strategy in order to decide whether to trigger the editing operation according to different cases. The adaptive strategy combines five pre-conditional theorems, all of which ensure an iterative reduction of classification error and an increase in the scale of new training sets under PAC learning theory. Experiments on UCI datasets and an application to small pulmonary nodules detection using chest CT images show that ADE-Co-Forest can more effectively enhance the performance of a learned hypothesis than Co-Forest and DE-Co-Forest (Co-Forest with Data Editing but without adaptive strategy).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Heart Disease Prediction using Machine Learning Techniques

Article 16 October 2020

A Review on Random Forest: An Ensemble Classifier

A survey on ensemble learning

Article 30 August 2019

References

Anagnostopoulos, I., & Maglogiannis, I. (2006). Neural network-based diagnostic and prognostic estimations in breast cancer microscopic instances. Medical and Biological Engineering and Computing, 44, 773–784.
Article Google Scholar
Angluin, D., & Laird, P. (1988). Learning from noisy examples. Machine Learning, 2(4), 343–370.
Google Scholar
Bennett, K. P., Demiriz, A., & Maclin, R. (2002). Exploiting unlabeled data in ensemble methods. In Proc. 8th ACM int. conf. on knowledge discovery and data mining (SIGKDD’02) (pp. 289–296). Canada: Edmonton.
Chapter Google Scholar
Blake, C., Keogh, E., & Merz, C. J. (1998). UCI repository of machine learning databases. Dept. Inf. and Comput. Sci., Univ. California, [Online]. http://www.ics.uci.edu/~mlearn/MLRepository.html.
Blum, A., & Chawla, S. (2001). Learning from labeled and unlabeled data using graph mincuts. In Proc. 18th int. conf. on machine learning (ICML01) (pp. 19–26). Williamstown, MA.
Blum, A., & Mitchell, T. (1998). Combining labeled and unlabeled data with co-training. In Proc. 11th annu. conf. on computational learning theory (pp. 92–100). U.S.A.: Wisconsin.
Chapter Google Scholar
Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.
MathSciNet MATH Google Scholar
Breiman, L. (2001). Random Forest. Machine Learning, 45(1), 5–32.
Article MATH Google Scholar
Chapelle, O., Schoelkopf, B., & Zien, A. (2006). Semi-supervised learning. Cambridge: MIT Press.
Google Scholar
Dasgupta, S., Littman, M., & McAllester, D. (2002). PAC generalization bounds for co-training. In Advances in neural information processing systems (NIPS02) (Vol. 4, pp. 375–382). Cambridge: MIT Press.
Google Scholar
Deng, C., & Guo, M. Z. (2006). Tri-training and data editing based semi-supervised clustering algorithm. In A. F. Gelbukhm & C. A. R. García (Eds.), MICAI2006: Advances in artificial intelligence (pp. 641–651). Mexico: Apizaco.
Google Scholar
Goldman, S., & Zhou, Y. (2000). Enhancing supervised learning with unlabeled data. In Proc. 17th int. conf. on machine Learning (ICML00) (pp. 327–334). San Francisco, CA.
Hansen, L., & Salamon, P. (1990). Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(10), 993–1001.
Article Google Scholar
Hwa, R., Osborne, M., Sarkar A., & Steedman, M. (2003). Corrected cotraining for statistical parsers. In Proc. 20th int. conf. on machine learning (ICML03) workshop on continuum from labeled to unlabeled data in machine learning and data mining (pp. 95–102). Washington, DC.
Jia, X. H., Wang, Z., & Chen, S. C. (2006). Fast screening out true negative regions for microcalcification detection in digital mammograms. Transaction of Nanjing University of Aeronautics & Astronautics, 23(1), 52–58.
MATH Google Scholar
Jiang, Y., & Zhou, Z. H. (2004). Editing training data for kNN classifiers with neural network ensemble. In Proc. IEEE 2004 int. sym. on neural networks (ISNN04) (pp. 356–361). Dalian, China.
Koprinska, I., Poon, J., Clark, J., & Chan, J. (2007). Learning to classify e-mail. Information Sciences, 177(10), 2167–2187.
Article Google Scholar
Li, M., & Zhou, Z. H. (2005). SETRED: Self-training with editing. In Proc. 9th Pacific-Asia conf. on knowledge discovery and data mining (PAKDD05) (pp. 611–621). Hanoi, Vietnam.
Li, M., & Zhou, Z. H. (2007). Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE Transactions on Systems, Man, and Cybernetics, Part A, 37(6), 1088–1098.
Article Google Scholar
Martínez, C., & Fuentes, O. (2003). Face recognition using unlabeled data. Computación y Sistemas, 7(2), 123–129.
Google Scholar
Mitchell, T. M. (1997). Machine learning (ch. 3). New York: McGraw-Hill.
Google Scholar
Muhlenbach, F., Lallich, S., & Zighed, D. A. (2004). Identifying and handling mislabeled instances. Journal of Intelligent Information Systems, 22(1), 89–109.
Article Google Scholar
Muhlenbruch, M. D. G., et al. (2006). Small pulmonary nodules: Effect of two computer-aided detection systems on radiologist performance. Radiology, 241(2), 564–571.
Article MathSciNet Google Scholar
Nigam K., & Ghani, R. (2000). Analyzing the effectiveness and applicability of co-training. In Proc. ACM 9th conf. on information and knowledge management (pp. 86–93). McLean, Virginia.
Nigam, K., McCallum, A. K., Thrun, S., & Mitchell, T. (2000). Text classification from labeled and unlabeled documents using EM. Machine Learning, 39(3–4), 103–134.
Article MATH Google Scholar
Paredes, R., & Vidal, E. (2006). Learning weighted metrics to minimize nearest-neighbor classification error. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(7), 1100–1110.
Article Google Scholar
Roli, F. (2005). Semi-supervised multiple classifier systems: Background and research direction. In Proc. multiple classifiers systems (pp. 1–11). Seaside, CA.
Sánchez, J. S., Barandela, R., Marqués, A. I., Alejo, R., & Badenas, J. (2003). Analysis of new techniques to obtain quality training sets. Pattern Recognition Letters, 24(7), 1015–1022.
Article Google Scholar
Seeger, M. (2001). Learning with labeled and unlabeled data. Tech. Rep., Univ. of Edinburgh, Edinburgh, Scotland.
Vincent, N., & Claire, C. (2003). Bootstrapping coreference classifiers with multiple machine learning algorithms. In Proc. 2003 conf. empirical methods in natural language processing (pp. 113–120). Sapporo, Japan.
Wilson, D. R., & Martinez, T. R. (1997). Improved heterogeneous distance functions. Journal of Artificial Intelligence Research, 6(1), 1–34.
MathSciNet MATH Google Scholar
Witten, I. H., & Frank, E. (2005). Data mining: Practical machine learning tools and techniques with java implementations (2nd ed.). San Francisco: Morgan Kaufmann.
Google Scholar
Xu, Q., Hu, D. H., Xue, H., Yu, W., & Yang, Q. (2009). Semi-supervised protein subcellular localization. BMC Bioinformatics, 10(suppl. 1), S47. doi:10.1186/1471-2105-10-S1-S47.
Article Google Scholar
Zhou, Y., & Goldman, S. (2004). Democratic co-learning. In Proc. 16th IEEE int. conf. tools with artificial intelligence (pp. 594–602). Boca Raton, FL.
Zhou, Z. H., & Li, M. (2005). Tri-training: Exploiting unlabeled data using three classifiers. IEEE Transactions on Knowledge and Data Engineering, 17(11), 1529–1541.
Article Google Scholar
Zhu, X. J. (2008). Semi-supervised learning literature survey. Tech. Rep. Computer Sciences, TR1530, Univ. of Wisconsin-Madison, Wisconsin.

Download references

Acknowledgements

This work is supported by the National Science Foundation of China under the Grant Nos.60702033, 60772076 and 2007307000189, the National High-Tec Research and Development Plant of China under the Grant No.2007AA01Z171, the Heilongjiang Science Foundation Key Project under the No. ZJG0705, the Science Foundation for Distinguished Young Scholars of Heilongjiang Province in China under the Grant No.JC200611. The authors thank partners from the 2nd Affiliated Hospital of Harbin Medical University for collecting and labeling the CT images.

Author information

Authors and Affiliations

School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
Chao Deng & M. Zu Guo
China Mobile Research Institute, Beijing, China
Chao Deng

Authors

Chao Deng
View author publications
You can also search for this author in PubMed Google Scholar
M. Zu Guo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chao Deng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Deng, C., Guo, M.Z. A new co-training-style random forest for computer aided diagnosis. J Intell Inf Syst 36, 253–281 (2011). https://doi.org/10.1007/s10844-009-0105-8

Download citation

Received: 24 March 2009
Revised: 02 August 2009
Accepted: 14 October 2009
Published: 29 October 2009
Issue Date: June 2011
DOI: https://doi.org/10.1007/s10844-009-0105-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new co-training-style random forest for computer aided diagnosis

Abstract

Access this article

Similar content being viewed by others

Heart Disease Prediction using Machine Learning Techniques

A Review on Random Forest: An Ensemble Classifier

A survey on ensemble learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A new co-training-style random forest for computer aided diagnosis

Abstract

Access this article

Similar content being viewed by others

Heart Disease Prediction using Machine Learning Techniques

A Review on Random Forest: An Ensemble Classifier

A survey on ensemble learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation