ABSTRACT
Discrete event simulation is an accepted instrument for investigating the dynamic behavior of complex systems and evaluating processes. Usually simulation experts conduct simulation experiments for a predetermined system specification by manually varying parameters through educated assumptions and according to a prior defined goal. As an alternative, data farming and knowledge discovery in simulation data are ongoing and popular methods in order to uncover unknown relationships and effects in the model to gain useful information about the underlying system. Those methods usually demand broad scale and data intensive experimental design, so computing time can quickly become large. As a solution to that, we extend an existing concept of knowledge discovery in simulation data with an online stream mining component to get data mining results even while experiments are still running. For this purpose, we introduce a method for using decision tree classification in combination with clustering algorithms for analyzing simulation output data that considers the flow of experiments as a data stream. A prototypical implementation proves the basic applicability of the concept and yields large possibilities for future research.
- Alpaydin, E. 2010. Introduction to Machine Learning. Adaptive computation and machine learning. MIT Press, Cambridge, Mass. Google ScholarDigital Library
- Babcock, B., Babu, S., Datar, M., Motwani, R., and Widom, J. 2002. Models and issues in data stream systems. In Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. ACM Press, New York, NY, 1--16. Google ScholarDigital Library
- Bifet, A., Holmes, G., Kirkby, R., and Pfahringer, B. 2010. MOA: Massive Online Analysis. The Journal of Machine Learning Research 11, 1601--1604. Google ScholarDigital Library
- Breiman, L. 1984. Classification and Regression Trees. Chapman & Hall/CRC; Chapman & Hall, New York.Google Scholar
- Domingos, P. and Hulten, G. Mining high-speed data streams. In the sixth ACM SIGKDD international conference, 71--80. Google ScholarDigital Library
- Elmegreen, B. G., Sanchez, S. M., and Szalay, A. S. 2014. The Future of Computerized Decision Making. In Proceedings of the 2014 Winter Simulation Conference, 943--949. Google ScholarDigital Library
- Feldkamp, N., Bergmann, S., and Strassburger, S. 2015. Knowledge Discovery in Manufacturing Simulations. In Proceedings of the 2015 ACM SIGSIM PADS Conference, 3--12. Google ScholarDigital Library
- Feldkamp, N., Bergmann, S., and Strassburger, S. 2015. Visual Analytics of Manufacturing Simulation Data. In Proceedings of the 2015 Winter Simulation Conference, 779--790. Google ScholarDigital Library
- Gaber, M. M., Zaslavsky, A., and Krishnaswamy, S. 2005. Mining data streams. SIGMOD Rec. 34, 2, 18--26. Google ScholarDigital Library
- Gama, J. 2010. Knowledge discovery from data streams. Chapman & Hall/CRC data mining and knowledge discovery series. Chapman & Hall/CRC, Boca Raton, FL. Google ScholarDigital Library
- Gama, J., Rocha, R., and Medas, P. 2003. Accurate decision trees for mining high-speed data streams. In Proceedings of the ninth ACM SIGKDD international Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, USA, 523. Google ScholarDigital Library
- Giabbanelli, P. J. 2010. Impact of complex network properties on routing in backbone networks. In 2010 IEEE Globecom Workshops. GC'10, Workshops: Dec 5, 2010 to Dec 10, 2010 in Miami, Florida, USA. IEEE, {Piscataway, N.J.}, 389--393.Google Scholar
- Horne, G., Åkesson, B., Meyer, T., and Anderson, S. 2014. Data farming in support of NATO. Final Report of Task Group MSG-088. STO technical report TR-MSG-088. North Atlantic Treaty Organisation, Neuilly-sur-Seine Cedex.Google Scholar
- Horne, G. E. and Meyer, T. 2010. Data farming and defense applications. In MODSIM World Conference and Expo.Google Scholar
- Horne, G. E. and Meyer, T. E. 2005. Data Farming: Discovering Surprise. In Winter Simulation Conference, 2005, 1082--1087. Google ScholarDigital Library
- Kallfass, D. and Schlaak, T. 2012. NATO MSG-088 Case Study Results to demonstrate the Benefit of using Data Farming for Military Decision support. In Proceedings of the 2012 Winter Simulation Conference, 1--12. Google ScholarDigital Library
- Kaushal, C. and Singh, H. 2015. Comparative Study of Recent Sequential Pattern Mining Algorithms on Web Clickstream Data. In 2015 IEEE Power, Communication and Information Technology Conference (PCITC), 652--656.Google Scholar
- Kleijnen, J. P., Sanchez, S. M., Lucas, T. W., and Cioppa, T. M. 2005. State-of-the-Art Review: A User's Guide to the Brave New World of Designing Simulation Experiments. INFORMS Journal on Computing 17, 3, 263--289. Google ScholarDigital Library
- Lemaire, V., Salperwyck, C., and Bondu, A. 2015. A Survey on Supervised Classification on Data Streams. In eBISS 2014. Lecture Notes in Business Information Processing, E. Zimányi and R.-D. Kutsche, Eds. Springer, Heidelberg, 88--125.Google Scholar
- Madden, S. and Franklin, M. J. 2002. Fjording the Stream: An Architecture for Queries over Streaming Sensor Data. In Proceedings of the 2002 Intl. Conf. on Data Engineering. IEEE, Piscataway, NJ, 555--566. Google ScholarDigital Library
- Oded Maron and Andrew W. Moore. Hoeffding Races: Accelerating Model Selection Search for Classification and Function Approximation.Google Scholar
- Quinlan, J. R. 1987. Simplifying decision trees. International Journal of Man-Machine Studies 27, 3, 221--234. Google ScholarDigital Library
- Rokach, L. and Maimon, O. 2008. Data mining with decision trees. Theory and applications. Series in machine perception and artificial intelligence v. 69. World Scientific, Singapore. Google ScholarDigital Library
- Sanchez, S. M. 2007. Work Smarter, Not Harder: Guidelines for Designing Simulation Experiments. In Proceedings of the 2007 Winter Simulation Conference. December 9 - 12, 2007, Washington, DC, U.S.A. IEEE, Piscataway, N.J., 84--94. Google ScholarDigital Library
- Sanchez, S. M. 2011. NOLHdesigns spreadsheet. http://harvest.nps.edu/. Accessed 1 February 2015.Google Scholar
- Sanchez, S. M. 2014. Simulation Experiments: Better Data, Not Just Big Data. In Proceedings of the 2014 Winter Simulation Conference, 805--816. Google ScholarDigital Library
- Sanchez, S. M. and Wan, H. 2009. Better than a petaflop: The power of efficient experimental design. In Proceedings of the 2009 Winter Simulation Conference (WSC 2009). (WSC 2009) : Austin, Texas : 13--16 December 2009. IEEE Service Center, Piscataway, N.J., 60--74. Google ScholarDigital Library
- Tang, Z., Xue, Q., Zhao, M., and Wei, Y. 2009. Decision Tree Algorithm for Tank Damage Analysis in Combat Simulation Tests. In 9th International Conference on Electronic Measurement & Instruments (ICEMI 2009), 3--830--3--835.Google Scholar
- Tercan, H., Al Khawli, T., Eppelt, U., Büscher, C., Meisen, T., and Jeschke, S. 2016. Use of Classification Techniques to Design Laser Cutting Processes. Procedia 5CIRP6 52, 292--297.Google Scholar
- Tsay, R. S. 2010. Analysis of financial time series. Wiley series in probability and statistics. Wiley, Hoboken, NJ.Google Scholar
- Vieira, H., Sanchez, S. M., Kienitz, K. H., and Belderrain, M. C. N. 2011. Improved efficient, nearly orthogonal, nearly balanced mixed designs. In Proceedings of the 2011 Winter Simulation Conference (WSC 2011), 3600--3611. Google ScholarDigital Library
- Yoshida, K. 2007. Sampling-Based Stream Mining for Network Risk Management. In New Frontiers in Artificial Intelligence. JSAI 2006 conference and workshops, Tokyo, Japan, June 5--9, 2006 ; revised selected papers. Lecture notes in computer science Lecture notes in artificial intelligence 4384. Springer, Berlin, New York, 374--386. Google ScholarDigital Library
- Zaharia, M., Franklin, M. J., Ghodsi, A., Gonzalez, J., Shenker, S., Stoica, I., Xin, R. S., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X., Rosen, J., and Venkataraman, S. 2016. Apache Spark. Commun. ACM 59, 11, 56--65. Google ScholarDigital Library
Index Terms
- Online Analysis of Simulation Data with Stream-based Data Mining
Recommendations
Knowledge Discovery in Simulation Data
Special Issue on Toward an Ecosystem of Models and DataThis article provides a comprehensive and in-depth overview of our work on knowledge discovery in simulations. Application-wise, we focus on manufacturing simulations. Specifically, we propose and discuss a methodology for designing, executing, and ...
Knowledge Discovery in Manufacturing Simulations
SIGSIM PADS '15: Proceedings of the 3rd ACM SIGSIM Conference on Principles of Advanced Discrete SimulationDiscrete event simulation studies in a manufacturing context are a powerful instrument when modeling and evaluating processes of various industries. Usually simulation experts conduct simulation experiments for a predetermined system specification by ...
Efficient Mining of Weighted Frequent Patterns over Data Streams
HPCC '09: Proceedings of the 2009 11th IEEE International Conference on High Performance Computing and CommunicationsBy considering different weights of the items, weighted frequent pattern (WFP)mining can discover more important knowledge compared to traditional frequent pattern mining. Therefore, WFP mining becomes an important research issue in data mining and ...
Comments