Abstract
The advent of Big Data era drives data analysts from different domains to use data mining techniques for data analysis. However, performing data analysis in a specific domain is not trivial; it often requires complex task configuration, onerous integration of algorithms, and efficient execution in distributed environments. Few efforts have been paid on developing effective tools to facilitate data analysts in conducting complex data analysis tasks. In this paper, we design and implement FIU-Miner, a Fast, Integrated, and User-friendly system to ease data analysis. FIU-Miner allows users to rapidly configure a complex data analysis task without writing a single line of code. It also helps users conveniently import and integrate different analysis programs. Further, it significantly balances resource utilization and task execution in heterogeneous environments. Case studies of real-world applications demonstrate the efficacy and effectiveness of our proposed system.




























Similar content being viewed by others
References
Anselin L (1995) Local indicators of spatial association—LISA. Geogr Anal 27(2):93–115
Belz R, Mertens P (1996) Combining knowledge-based systems and simulation to solve rescheduling problems. Decis Support Syst 17(2):141–157
Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC Press, Boca Raton
Chang C-C, Lin Chih-Jen (2011) Libsvm: a library for support vector machines. TIST 2(3):27
Chen Injazz J (2001) Planning for ERP systems: analysis and future trend. Bus Process Manag J 7(5):374–386
Chen W-C, Tseng S-S, Wang Ching-Yao (2005) A novel manufacturing defect detection method using association rule mining techniques. Exp Syst Appl 29(4):807–815
Davis Chad A, Gerick Fabian, Hintermair Volker, Friedel Caroline C, Fundel Katrin, Küffner Robert, Zimmer Ralf (2006) Reliable gene signatures for microarray classification: assessment of stability and performance. Bioinformatics 22(19):2356–2363
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 1189–1232
Groger C, Niedermann F, Schwarz H, Mitschang B (2012) Supporting manufacturing design by analytics, continuous collaborative process improvement enabled by the advanced manufacturing analytics platform. In: CSCWD, pp 793–799. IEEE
Gröger C, Niedermann F, Mitschang B (2012) Data mining-driven manufacturing process optimization. Proc World Congr Eng 3:4–6
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD explorations newsletter 11(1):10–18
Jiang Y, Perng C-S, Sailer A, Silva-Lepe I, Zhou Yang, Li Tao (2016) CSM: a cloud service marketplace for complex service acquisition. ACM TIST 8(1):8
Kalousis A, Prados J, Hilario M (2007) Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl Inf Syst 12(1):95–116
Li H, Calder CA, Cressie N (2007) Beyond Moran’s I: testing for spatial dependence based on the spatial autoregressive model. Geogr Anal 39(4):357–375
Lei L, Wei P, Saurabh K, Tong S, Tao L (2015) Recommending users and communities in social media. ACM Trans Knowl Discov Data 10(2):17:1–17:27
Li L, Shen C, Wang L, Zheng L, Jiang Y, Tang L, Li H, Zhang L, Zeng C, Li T, Tang J, Liu D (2014) Iminer: mining inventory data for intelligent management. In: Proceedings of the 23rd ACM international conference on conference on information and knowledge management, CIKM ’14, pp 2057–2059, New York, ACM
Liu H, Motoda H (2008) Computational methods of feature selection. Chapman & Hall, London
Loscalzo S, Yu L, Ding C (2009) Consensus group stable feature selection. In: SIGKDD, pp 567–576. ACM
Lu Y, Zhang M, Li T, Guang Y, Rishe N (2013) Online spatial data analysis and visualization system. In: Proceedings of the ACM SIGKDD workshop on interactive data exploration and analytics, pp 71–78. ACM
MLC++. http://www.sgi.com/tech/mlc
Oh S, Han J, Cho H (2001) Intelligent process control system for quality improvement by data mining in the process industry. In: Dan B (ed) Data mining for design and manufacturing, pp 289–309. Springer, Berlin
Owen S, Anil R, Dunning T, Friedman E (2011) Mahout in action. Manning, New York
Pang-Ning T, Steinbach M, Kumar V et al (2006) Introduction to data mining. Pearson Education, USA
Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE PAMI 27(8):1226–1238
Pindyck RS, Rubinfeld DL (1998) Econometric models and economic forecasts. Irwin and McGraw-Hill, New York
Prekopcsak Z, Makrai G, Henk T, Gaspar-Papanek C (2011) Radoop: analyzing big data with rapidminer and hadoop. In: RCOMM
Rasmussen CE (2006) Gaussian processes for machine learning. MIT Press, Cambridge
Shen L, Francis EHT, Liangsheng Q, Yudi S (2000) Fault diagnosis using rough sets theory. Comput Ind 43(1):61–72
Skormin VA, Gorodetski VI, Popyack LJ (2002) Data mining technology for failure prognostic of avionics. TAES 38(2):388–403
Tan P-N, Steinbach M, Kumar V (2006) Introduction to data mining. Pearson Education, USA
Tao L, Chunqiu Z, Wubai Z, Qifeng Z, Li Z (2015) Data mining in the era of big data: from the application perspective. Big Data Res 1(4):1–24
Topchy A, Jain AK, Punch W (2004) A mixture model of clustering ensembles. In: SDM, pp 379–390. doi:10.1137/1.9781611972740.35
Unger DA, van den Dool H, O’Lenic E, Collins D (2009) Ensemble regression. Month Weather Rev 137(7):2365–2379
Woznica A, Nguyen P, Kalousis A (2012) Model mining for robust feature selection. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining ACM, New York
Yu L, Zheng J, Wu B, Wang B, Shen C, Qian L, Zhang R (2012) Bc-pdm: data mining, social network analysis and text mining system based on cloud computing. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 1496–1499). ACM, New York
Yu L, Ding C, Loscalzo S (2008) Stable feature selection via dense feature groups. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, pp 803–811. ACM, New York
Zeng C, Jiang Y, Zheng L, Li J, Li L, Li H, Shen C, Zhou W, Li T, Duan B, Lei M, Wang P (2013) FIU-Miner: international conference on knowledge discovery and data mining, pp 1506–1509
Zeng C, Li H, Wang H, Guang Y, Liu C, Li T, Zhang M, Chen S-C, Rishe N (2014) Optimizing online spatial data analysis with sequential query patterns. In: Joshi J, Bertino E, Thuraisingham BM, Liu L (eds) IRI, pp 253–260. IEEE
Zhang M, Wang H, Lu Y, Li T, Guang Y, Liu C, Edrosa E, Li H, Rishe N (2015) Terrafly geocloud: an online spatial data analysis and visualization system. ACM Trans Intell Syst Technol 6(3):34:1–34:24
Zheng L, Shen C, Tang L, Zeng C, Li T, Luis S, Chen S-C (2013) Data mining meets the needs of disaster information management. IEEE Trans Hum-Mach Syst 43(5):451–464
Zheng L, Zeng C, Li L, Jiang Y, Xue W, Li J, Shen C, Zhou W, Li H, Tang L, Li T, Duan B, Lei M, Wang P (2014) Applying data mining techniques to address critical process optimization needs in advanced manufacturing. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’14, pp 1739–1748, New York, ACM
Zipkin PH (2000) Foundations of inventory management, vol 2
Acknowledgements
We would like to thank the following former members of Knowledge Discovery Research Group (KDRG) at FIU: Dr. Li Zheng, Dr. Lei Li, Dr. Yexi Jiang, Dr. Liang Tang, Dr. Chao Shen, and Dr. Jingxuan Li, for their contributions to the FIU-Miner project. We would also like to thank the High Performance Database Research Center at FIU for the cooperation on spatial data analysis. This project was partially supported by the National Science Foundation under Grants HRD-0833093, CNS-1126619, IIS-1213026, and CNS-1461926, the US Department of Homeland Security’s VACCINE Center under Award Number 2009-ST-061-CI0001, Nanjing University of Posts and Telecommunications under Grants NY214135 and NY215045, Scientific and Technological Support Project (Society) of Jiangsu Province No. BE2016776, Chinese National Natural Science Foundation under Grant 91646116, and an FIU Dissertation Year Fellowship.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, T., Zeng, C., Zhou, W. et al. FIU-Miner (a fast, integrated, and user-friendly system for data mining) and its applications. Knowl Inf Syst 52, 411–443 (2017). https://doi.org/10.1007/s10115-016-1014-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-016-1014-0