Abstract
Active learning can select most informative unlabeled samples to manually annotate to enlarge the training set. Many active learning methods have been proposed so far, most of them work for these data that have all classes of tagged data. A few methods work for positive and unlabeled data and the computational complexity of existing methods is particularly high and they can’t work well for big data. In this paper, we proposed an active learning approach that works well when only small number positive data are available in big data. We utilize data preprocessing to remove most of the outliers, so the density calculation is simplified relative to KNN algorithm, and our proposed sample selection strategy Min-Uncertainty Density (MDD) can help select more uncertain and higher density unlabeled samples with less computation. A combined semi-supervised learning active learning technique (MDD-SSAL) automatically annotating some confident unlabeled samples in the each iteration is proposed to reduce the number of manually annotated samples. Experimental results indicate that our proposed method is competitive with other similar methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2(1), 999–1006 (2001)
Wang, M., Hua, X.S.: Active learning in multimedia annotation and retrieval a survey. ACM Trans. Intell. Syst. Technol. 2(2), 1–21 (2011)
Raina, R., Battle, A., Lee, H., Packer, B., Ng, A.Y.: Self-taught Learning (2007)
Xiaojin, Z.: Semi-supervised learning literature survey 37(1), 63–77 (2005)
Liu, B., Lee, W. S., Yu, P.S., Li, X.: Partially supervised classification of text documents. In: Nineteenth International Conference on Machine Learning, pp. 387–394. Morgan Kaufmann Publishers Inc. (2002)
Li, X., Liu, B.: Learning to classify texts using positive and unlabeled data. In: International Joint Conference on Artificial Intelligence, pp. 587–592. Morgan Kaufmann Publishers Inc. (2003)
Ren, Y.F., Ji, D.H., Zhang, H.B.: Positive unlabeled learning for deceptive reviews detection. In: EMNLP, pp. 488–498 (2014)
Plessis, M.C.D., Niu, G., Sugiyama, M.: Convex formulation for learning from positive and unlabeled data, pp. 1386–1394 (2015)
Zhang, J., Wang, Z., Yuan, J., Tan, Y.P.: Positive and unlabeled learning for anomaly detection with multi-features, pp. 854–862. ACM (2017)
Gu, Y., Jin, Z., Chiu, S.C.: Active learning combining uncertainty and diversity for multi-class image classification. IET Comput. Vis. 9(3), 400–407 (2015)
He, G., Li, Y., Zhao, W.: An uncertainty and density based active semi-supervised learning scheme for positive unlabeled multivariate time series classification. Knowl.-Based Syst. 124, 80–92 (2017)
Li, Y., He, G., Xia, X., Li, Y.: A reverse nearest neighbor based active semi-supervised learning method for multivariate time series classification. In: Hartmann, S., Ma, H. (eds.) DEXA 2016. LNCS, vol. 9827, pp. 272–286. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44403-1_17
Zhu, J., Wang, H., Ma, M., Ma, M.: Active learning with sampling by uncertainty and density for data annotations. IEEE Trans. Audio Speech Lang. Process. 18(6), 1323–1331 (2010)
Huang, S.J., Jin, R., Zhou, Z.H.: Active learning by querying informative and representative examples. IEEE Trans. Pattern Anal. Mach. Intell. 36(10), 1936–1949 (2014)
Guo, H., Wang, W.: An active learning-based SVM multi-class classification model. Pattern Recognit. 48(5), 1577–1597 (2015)
Ghasemi, A., Rabiee, H.R., Fadaee, M., Manzuri, M.T., Rohban, M.H.: Active learning from positive and unlabeled data. In: IEEE, International Conference on Data Mining Workshops, pp. 244–250. IEEE (2012)
Seung, H.S., Opper, M., Sompolinsky.: Query by committee. In: Proceedings of the Fifth Workshop on Computational Learning Theory, vol. 284, pp. 287–294 (1992)
Hady, M.F.A., Schwenker, F.: Combining committee-based semi-supervised learning and active learning. J. Comput. Sci. Technol. 25(4), 681–698 (2010)
Abe, N., Mamitsuka, H.: Query learning strategies using boosting and bagging. In: Fifteenth International Conference on Machine Learning, pp. 1–9. DBLP (1998)
Settles, B., Craven, M.: An analysis of active learning strategies for sequence labeling tasks. In: Conference on Empirical Methods in Natural Language Processing, pp. 1070–1079. Association for Computational Linguistics (2008)
Dasgupta, S., Hsu, D.: Hierarchical sampling for active learning. In: Proceedings of Icml, pp. 208–215 (2015)
Wang, M., Min, F., Zhang, Z.H., Wu, Y.X.: Active learning through density clustering. Expert Syst. Appl. 85, 305–317 (2017)
He, G., Duan, Y., Li, Y., Qian, T., He, J., Jia, X.: Active learning for multivariate time series classification with positive unlabeled data. In: IEEE International Conference on TOOLS with Artificial Intelligence, pp. 178–185. IEEE (2016)
http://archive.ics.uci.edu/ml/datasets/Diabetic+Retinopathy+Debrecen+Data+Set
Acknowledgments
Supported by the National Science and Technology Major Project (2018ZX03001019-003), the National Natural Science Foundation of China (Grant No.61372088).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Luo, J., Zhou, W., Du, Y. (2018). An Active Learning Based on Uncertainty and Density Method for Positive and Unlabeled Data. In: Vaidya, J., Li, J. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2018. Lecture Notes in Computer Science(), vol 11334. Springer, Cham. https://doi.org/10.1007/978-3-030-05051-1_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-05051-1_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05050-4
Online ISBN: 978-3-030-05051-1
eBook Packages: Computer ScienceComputer Science (R0)