skip to main content
research-article

Incremental Feature Spaces Learning with Label Scarcity

Authors Info & Claims
Published:08 September 2022Publication History
Skip Abstract Section

Abstract

Recently, learning and mining from data streams with incremental feature spaces have attracted extensive attention, where data may dynamically expand over time in both volume and feature dimensions. Existing approaches usually assume that the incoming instances can always receive true labels. However, in many real-world applications, e.g., environment monitoring, acquiring the true labels is costly due to the need of human effort in annotating the data. To tackle this problem, we propose a novel incremental Feature spaces Learning with Label Scarcity (FLLS) algorithm, together with its two variants. When data streams arrive with augmented features, we first leverage the margin-based online active learning to select valuable instances to be labeled and thus build superior predictive models with minimal supervision. After receiving the labels, we combine the online passive-aggressive update rule and margin-maximum principle to jointly update the dynamic classifier in the shared and augmented feature space. Finally, we use the projected truncation technique to build a sparse but efficient model. We theoretically analyze the error bounds of FLLS and its two variants. Also, we conduct experiments on synthetic data and real-world applications to further validate the effectiveness of our proposed algorithms.

REFERENCES

  1. [1] Zhang Zhenyu, Zhao Peng, Jiang Yuan, and Zhou Zhi-Hua. 2020. Learning with feature and distribution evolvable streams. In Proceedings of the 37th International Conference on Machine Learning, Vol. 119, PMLR, 1131711327.Google ScholarGoogle Scholar
  2. [2] Zhang Qin, Zhang Peng, Long Guodong, Ding Wei, Zhang Chengqi, and Wu Xindong. 2016. Online learning from trapezoidal data streams. IEEE Transactions on Knowledge and Data Engineering 28, 10 (2016), 27092723.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Hou Bo-Jian, Zhang Lijun, and Zhou Zhi-Hua. 2017. Learning with feature evolvable streams. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 14171427.Google ScholarGoogle Scholar
  4. [4] Hou Chenping and Zhou Zhi-Hua. 2018. One-pass learning with incremental and decremental features. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 11 (2018), 27762792.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Wu Di, He Yi, Luo Xin, Shang Mingsheng, and Wu Xindong. 2019. Online feature selection with capricious streaming features: A general framework. In Proceedings of the IEEE International Conference on Big Data. IEEE, 683688.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Hoi Steven C. H., Sahoo Doyen, Lu Jing, and Zhao Peilin. 2021. Online learning: A comprehensive survey. Neurocomputing.Google ScholarGoogle Scholar
  7. [7] Liu Dehua, Zhang Peng, and Zheng Qinghua. 2015. An efficient online active learning algorithm for binary classification. Pattern Recognition Letters 68, P1 (2015), 2226.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Ma Liyao, Destercke Sébastien, and Wang Yong. 2016. Online active learning of decision trees with evidential data. Pattern Recognition 52, C (2016), 3345.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Hao Shuji, Lu Jing, Zhao Peilin, Zhang Chi, Hoi Steven C. H., and Miao Chunyan. 2018. Second-order online active learning and its applications. IEEE Transactions on Knowledge & Data Engineering 30, 7 (2018), 13381351.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Zhu Lei, Pang Shaoning, Sarrafzadeh Abdolhossein, Ban Tao, and Inoue Daisuke. 2016. Incremental and decremental max-flow for online semi-supervised learning. IEEE Transactions on Knowledge and Data Engineering 28, 8 (2016), 21152127.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Zhao Peilin, Wang Dayong, Wu Pengcheng, and Hoi Steven C. H.. 2020. A unified framework for sparse online learning. ACM Transactions on Knowledge Discovery from Data 14, 5, Article 59 (Aug. 2020), 20 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Hou Bo-Jian, Zhang Lijun, and Zhou Zhi-Hua. 2019. Prediction with unpredictable feature evolution. IEEE Transactions on Neural Networks and Learning Systems (2021), 1–10. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Hou Bo-Jian, Yan Yu-Hu, Zhao Peng, and Zhou Zhi-Hua. 2021. Storage fit learning with feature evolvable streams. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021. AAAI Press, 7729–7736. https://ojs.aaai.org/index.php/AAAI/article/view/16944.Google ScholarGoogle Scholar
  14. [14] Zhou Peng, Li Peipei, Zhao Shu, and Wu Xindong. 2020. Feature interaction for streaming feature selection. IEEE Transactions on Neural Networks and Learning Systems 32, 10 (2020), 4691–4702.Google ScholarGoogle Scholar
  15. [15] Hu Xuegang, Zhou Peng, Li Pei-Pei, Wang Jing, and Wu Xindong. 2018. A survey on online feature selection with streaming features. Frontiers of Computer Science 12, 3 (2018), 479493.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] Wu Xindong, Yu Kui, Ding Wei, Wang Hao, and Zhu Xingquan. 2013. Online feature selection with streaming features. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 5 (2013), 11781192.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Beyazit Ege, Alagurajah Jeevithan, and Wu Xindong. 2019. Online learning from data streams with varying feature spaces. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, AAAI Press, 32323239.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] He Yi, Wu Baijun, Wu Di, Beyazit Ege, and Wu Xindong. 2020. Toward mining capricious data streams: A generative approach. IEEE Transactions on Neural Networks and Learning Systems PP, 99 (2020), 113.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Belkin Mikhail, Niyogi Partha, and Sindhwani Vikas. 2006. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research 7, 85 (2006), 23992434.Google ScholarGoogle Scholar
  20. [20] Goldberg Andrew B., Li Ming, and Zhu Xiaojin. 2008. Online manifold regularization: A new learning setting and empirical study. In Proceedings of the 2008th European Conference on Machine Learning and Knowledge Discovery in Databases, Vol. 5211, 393407.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Farajtabar Mehrdad, Shaban Amirreza, Rabiee Hamid Reza, and Rohban Mohammad Hossein”. 2011. Manifold coarse graining for online semi-supervised learning. In Proceedings of the 2011 European Conference on Machine Learning and Knowledge Discovery in Databases. 391406.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Kumagai Atsutoshi and Iwata Tomoharu. 2018. Learning dynamics of decision boundaries without additional labeled data. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 16271636.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Cesa-Bianchi Nicolo, Conconi Alex, and Gentile Claudio. 2003. Learning probabilistic linear-threshold classifiers via selective sampling. In Learning Theory and Kernel Machines. B. Scholkopf and M. K. Warmuth (Eds.), Springer, 373387.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Cesa-Bianchi Nicolò, Gentile Claudio, and Zaniboni Luca. 2006. Worst-case analysis of selective sampling for linear classification. Journal of Machine Learning Research 7, 44 (2006), 12051230.Google ScholarGoogle Scholar
  25. [25] Zhao Peilin and Hoi Steven C. H.. 2013. Cost-sensitive online active learning with application to malicious URL detection. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 919927.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Lu Jing, Zhao Peilin, and Hoi Steven C. H.. 2014. Online passive aggressive active learning and its applications. In Proceedings of the 6th Asian Conference on Machine Learning, Vol. 39, JMLR.org.Google ScholarGoogle Scholar
  27. [27] Baram Yoram, El-Yaniv Ran, and Luz Kobi. 2004. Online choice of active learning algorithms. Journal of Machine Learning Research 5, Mar (2004), 255291.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Hao Shuji, Hu Peiying, Zhao Peilin, Hoi Steven C. H., and Miao Chunyan. 2018. Online active learning with expert advice. ACM Transactions on Knowledge Discovery from Data 12, 5 (2018), 58:1–58:22.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Hao Shuji, Hoi Steven C. H., Miao Chunyan, and Zhao Peilin. 2015. Active crowdsourcing for annotation. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, Vol. 2, IEEE Computer Society, 18.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Seung H. Sebastian, Opper Manfred, and Sompolinsky Haim. 1992. Query by committee. In Proceedings of the 5th Annual ACM Conference on Computational Learning Theory. ACM, 287294.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] He Yi, Yuan Xu, Chen Sheng, and Wu Xindong. 2021. Online learning in variable feature spaces under incomplete supervision. In Proceedings of the 35th AAAI Conference on Artificial Intelligence. AAAI Press, 41064114.Google ScholarGoogle Scholar
  32. [32] Crammer Koby, Dekel Ofer, Keshet Joseph, Shalev-Shwartz Shai, and Singer Yoram. 2006. Online passive-aggressive algorithms. Journal of Machine Learning Research 7, 19 (2006), 551585.Google ScholarGoogle Scholar
  33. [33] Boyd S. and Vandenberghe L.. 2004. Convex Optimization. Convex Optimization.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Lu Jing, Zhao Peilin, and Hoi Steven C. H.. 2016. Online passive-aggressive active learning. Machine Learning 103, 2 (2016), 141183.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Chang Chih-Chung and Lin Chih-Jen. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 3 (2011), 27:1–27:27. Retrieved from http://www.csie.ntu.edu.tw/cjlin/libsvm.Google ScholarGoogle Scholar
  36. [36] Cesa-Bianchi Nicolò and Lugosi Gábor. 2006. Prediction, Learning, and Games. Cambridge University Press.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Incremental Feature Spaces Learning with Label Scarcity

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Knowledge Discovery from Data
        ACM Transactions on Knowledge Discovery from Data  Volume 16, Issue 6
        December 2022
        631 pages
        ISSN:1556-4681
        EISSN:1556-472X
        DOI:10.1145/3543989
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 8 September 2022
        • Online AM: 27 June 2022
        • Accepted: 1 January 2022
        • Revised: 1 December 2021
        • Received: 1 September 2021
        Published in tkdd Volume 16, Issue 6

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed
      • Article Metrics

        • Downloads (Last 12 months)119
        • Downloads (Last 6 weeks)9

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format