Improvised methods for tackling big data stream mining challenges: case study of human activity recognition

Fong, Simon; Liu, Kexing; Cho, Kyungeun; Wong, Raymond; Mohammed, Sabah; Fiaidhi, Jinan

doi:10.1007/s11227-016-1639-5

Improvised methods for tackling big data stream mining challenges: case study of human activity recognition

Published: 16 February 2016

Volume 72, pages 3927–3959, (2016)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Simon Fong¹,
Kexing Liu¹,
Kyungeun Cho²,
Raymond Wong³,
Sabah Mohammed⁴ &
…
Jinan Fiaidhi⁴

864 Accesses
9 Citations
Explore all metrics

Abstract

Big data stream is a new hype but a practical computational challenge founded on data streams that are prevalent in applications nowadays. It is quite well known that data streams that are originated and collected from monitoring sensors accumulate continuously to a very huge amount making traditional batch-based model induction algorithms infeasible for real-time data mining or just-in-time data analytics. In this position paper, following a new data stream mining methodology, namely stream-based holistic analytics and reasoning in parallel (SHARP), a list of data analytic challenges as well as improvised methods are looked into. In particular, two types of decision tree algorithms, batch-mode and incremental-mode, are put under test at sensor data that represents a typical big data stream. We investigate whether and to what extent of two improvised methods—outlier removal and balancing imbalanced class distributions—affect the prediction performance in big data stream mining. SHARP is founded on incremental learning which does not require all the training to be loaded into the memory. This important fundamental concept needs to be supported not only by the decision tree algorithms, but by the other improvised methods usually at the preprocessing stage as well. This paper sheds some light into this area which is often overlooked by data analysts when it comes to big data stream mining.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data Science and Analytics: An Overview from Data-Driven Smart Computing, Decision-Making and Applications Perspective

Article 12 July 2021

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A survey of transfer learning

Article Open access 28 May 2016

Notes

References

Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers, San Francisco
Pai P-F, Chen T-C (2009) Rough set theory with discriminant analysis in analyzing electricity loads. Expert Syst Appl 36:8799–8806
Article Google Scholar
Gaber MM, Zaslavsky A, Krishnaswamy S (2005) Mining data streams: a review. ACM SIGMOD Rec 34(2):18–26
Article MATH Google Scholar
Fan W, Bifet A (2005) Mining big data: current status, and forecast to the future. SIGKDD Explor 14(2):1–5
Article Google Scholar
Murdopo A (2013) Distributed decision tree learning for mining big data streams. Master of Science Thesis. European Master in Distributed Computing
Fong S, Zhuang Y, Wong R, Mohammed S (2014) A Scalable data stream mining methodology: stream-based holistic analytics and reasoning in parallel. In: Proceedings of the 2nd International symposium on computational and business intelligence, New Delhi, 7–8 Dec 2014, pp 110–115
Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis. J Mach Learn Res 99:1601–1604
Google Scholar
Perkins S, Lacker K, Theiler J (2003) Grafting: fast, incremental feature selection by gradient descent in function space. J Mach Learn Res 3:1333–1356
MathSciNet MATH Google Scholar
Shu W, Shen H (2014) Incremental feature selection based on rough set in dynamic incomplete data. Pattern Recognit 47(12):3890–3906
Article Google Scholar
Katakis I, Tsoumakas G, Vlahavas I (2005) On the utility of incremental feature selection for the classification of textual data streams. In: PCI 2005, LNCS 3746. Springer, pp 338–348
Fong S, Liang J, Wong R, Ghanavati M (2014) A novel feature selection by clustering coefficients of variations. In: Proceedings of the 9th International conference on digital information management (ICDIM), Phitsanulok, 29 Sept–1 Oct 2014, pp 205–213
Fong S, Deb S, Yang X-S, Li J (2014) Feature selection in life science classification: metaheuristic swarm search. IT Prof 16(4):24–29
Article Google Scholar
Brest J, Boskovic B, Zamuda A, Fister I, Mezura-Montes E (2013) Real parameter single objective optimization using self-adaptive differential evolution algorithm with more strategies. In: Proceedings of the IEEE Congress on Evolutionary Computation (CEC), Cancun, 20–23 June 2013, pp 377–383
Ryoo MS, Aggarwal JK (2011) Stochastic representation and recognition of high-level group activities. Int J Comput Vis (IJCV) 93(2):183–200
Fatima I, Fahim M, Lee YK, Lee S (2013) Analysis and effects of smart home dataset characteristics for daily life activity recognition. J Supercomput 66(2):760–780
Article Google Scholar
Edwards Chris (2014) Decoding the language of human movement. Commun ACM 57(12):12–14
Article Google Scholar
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res Arch 16(1):321–357
MATH Google Scholar
Li J, Fong S, Mohammed S, Fiaidhi J (2015) Improving the classification performance of biological imbalanced datasets by swarm optimization algorithms. J Supercomput, Springer, pp 1–21
Fong S, Wong R, Vasilakos A (2015) Accelerated PSO swarm search feature selection for data stream mining big data. IEEE Trans Serv Comput 99:1–12. doi:10.1109/TSC.2015.2439695
Fong S, Zhuang Y, Tang R, Yang X-S, Deb S (2013) Selecting optimal feature set in high-dimensional data by swarm search. J Appl Math 2013:18. doi:10.1155/2013/590614 (Article ID 590614)

Download references

Acknowledgments

The authors are thankful for the financial support from the research grant “Temporal Data Stream Mining by Using Incrementally Optimized Very Fast Decision Forest (iOVFDF)”, Grant No. MYRG2015-00128-FST, offered by the University of Macau, FST, and RDAO.

Author information

Authors and Affiliations

Department of Computer and Information Science, University of Macau, Macau, SAR, China
Simon Fong & Kexing Liu
Department of Multimedia Engineering, Dongguk University, Seoul, Korea
Kyungeun Cho
School of Computer Science and Engineering, University of New South Wales, Sydney, Australia
Raymond Wong
Department of Computer Science, Lakehead University, Thunder Bay, Canada
Sabah Mohammed & Jinan Fiaidhi

Authors

Simon Fong
View author publications
You can also search for this author in PubMed Google Scholar
Kexing Liu
View author publications
You can also search for this author in PubMed Google Scholar
Kyungeun Cho
View author publications
You can also search for this author in PubMed Google Scholar
Raymond Wong
View author publications
You can also search for this author in PubMed Google Scholar
Sabah Mohammed
View author publications
You can also search for this author in PubMed Google Scholar
Jinan Fiaidhi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Simon Fong.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fong, S., Liu, K., Cho, K. et al. Improvised methods for tackling big data stream mining challenges: case study of human activity recognition. J Supercomput 72, 3927–3959 (2016). https://doi.org/10.1007/s11227-016-1639-5

Download citation

Published: 16 February 2016
Issue Date: October 2016
DOI: https://doi.org/10.1007/s11227-016-1639-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improvised methods for tackling big data stream mining challenges: case study of human activity recognition

Abstract

Access this article

Similar content being viewed by others

Data Science and Analytics: An Overview from Data-Driven Smart Computing, Decision-Making and Applications Perspective

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A survey of transfer learning

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improvised methods for tackling big data stream mining challenges: case study of human activity recognition

Abstract

Access this article

Similar content being viewed by others

Data Science and Analytics: An Overview from Data-Driven Smart Computing, Decision-Making and Applications Perspective

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A survey of transfer learning

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation