skip to main content
10.1145/3064911.3064915acmconferencesArticle/Chapter ViewAbstractPublication PagespadsConference Proceedingsconference-collections
research-article

Online Analysis of Simulation Data with Stream-based Data Mining

Published:16 May 2017Publication History

ABSTRACT

Discrete event simulation is an accepted instrument for investigating the dynamic behavior of complex systems and evaluating processes. Usually simulation experts conduct simulation experiments for a predetermined system specification by manually varying parameters through educated assumptions and according to a prior defined goal. As an alternative, data farming and knowledge discovery in simulation data are ongoing and popular methods in order to uncover unknown relationships and effects in the model to gain useful information about the underlying system. Those methods usually demand broad scale and data intensive experimental design, so computing time can quickly become large. As a solution to that, we extend an existing concept of knowledge discovery in simulation data with an online stream mining component to get data mining results even while experiments are still running. For this purpose, we introduce a method for using decision tree classification in combination with clustering algorithms for analyzing simulation output data that considers the flow of experiments as a data stream. A prototypical implementation proves the basic applicability of the concept and yields large possibilities for future research.

References

  1. Alpaydin, E. 2010. Introduction to Machine Learning. Adaptive computation and machine learning. MIT Press, Cambridge, Mass. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Babcock, B., Babu, S., Datar, M., Motwani, R., and Widom, J. 2002. Models and issues in data stream systems. In Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. ACM Press, New York, NY, 1--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bifet, A., Holmes, G., Kirkby, R., and Pfahringer, B. 2010. MOA: Massive Online Analysis. The Journal of Machine Learning Research 11, 1601--1604. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Breiman, L. 1984. Classification and Regression Trees. Chapman & Hall/CRC; Chapman & Hall, New York.Google ScholarGoogle Scholar
  5. Domingos, P. and Hulten, G. Mining high-speed data streams. In the sixth ACM SIGKDD international conference, 71--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Elmegreen, B. G., Sanchez, S. M., and Szalay, A. S. 2014. The Future of Computerized Decision Making. In Proceedings of the 2014 Winter Simulation Conference, 943--949. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Feldkamp, N., Bergmann, S., and Strassburger, S. 2015. Knowledge Discovery in Manufacturing Simulations. In Proceedings of the 2015 ACM SIGSIM PADS Conference, 3--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Feldkamp, N., Bergmann, S., and Strassburger, S. 2015. Visual Analytics of Manufacturing Simulation Data. In Proceedings of the 2015 Winter Simulation Conference, 779--790. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Gaber, M. M., Zaslavsky, A., and Krishnaswamy, S. 2005. Mining data streams. SIGMOD Rec. 34, 2, 18--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Gama, J. 2010. Knowledge discovery from data streams. Chapman & Hall/CRC data mining and knowledge discovery series. Chapman & Hall/CRC, Boca Raton, FL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Gama, J., Rocha, R., and Medas, P. 2003. Accurate decision trees for mining high-speed data streams. In Proceedings of the ninth ACM SIGKDD international Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, USA, 523. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Giabbanelli, P. J. 2010. Impact of complex network properties on routing in backbone networks. In 2010 IEEE Globecom Workshops. GC'10, Workshops: Dec 5, 2010 to Dec 10, 2010 in Miami, Florida, USA. IEEE, {Piscataway, N.J.}, 389--393.Google ScholarGoogle Scholar
  13. Horne, G., Åkesson, B., Meyer, T., and Anderson, S. 2014. Data farming in support of NATO. Final Report of Task Group MSG-088. STO technical report TR-MSG-088. North Atlantic Treaty Organisation, Neuilly-sur-Seine Cedex.Google ScholarGoogle Scholar
  14. Horne, G. E. and Meyer, T. 2010. Data farming and defense applications. In MODSIM World Conference and Expo.Google ScholarGoogle Scholar
  15. Horne, G. E. and Meyer, T. E. 2005. Data Farming: Discovering Surprise. In Winter Simulation Conference, 2005, 1082--1087. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Kallfass, D. and Schlaak, T. 2012. NATO MSG-088 Case Study Results to demonstrate the Benefit of using Data Farming for Military Decision support. In Proceedings of the 2012 Winter Simulation Conference, 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Kaushal, C. and Singh, H. 2015. Comparative Study of Recent Sequential Pattern Mining Algorithms on Web Clickstream Data. In 2015 IEEE Power, Communication and Information Technology Conference (PCITC), 652--656.Google ScholarGoogle Scholar
  18. Kleijnen, J. P., Sanchez, S. M., Lucas, T. W., and Cioppa, T. M. 2005. State-of-the-Art Review: A User's Guide to the Brave New World of Designing Simulation Experiments. INFORMS Journal on Computing 17, 3, 263--289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Lemaire, V., Salperwyck, C., and Bondu, A. 2015. A Survey on Supervised Classification on Data Streams. In eBISS 2014. Lecture Notes in Business Information Processing, E. Zimányi and R.-D. Kutsche, Eds. Springer, Heidelberg, 88--125.Google ScholarGoogle Scholar
  20. Madden, S. and Franklin, M. J. 2002. Fjording the Stream: An Architecture for Queries over Streaming Sensor Data. In Proceedings of the 2002 Intl. Conf. on Data Engineering. IEEE, Piscataway, NJ, 555--566. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Oded Maron and Andrew W. Moore. Hoeffding Races: Accelerating Model Selection Search for Classification and Function Approximation.Google ScholarGoogle Scholar
  22. Quinlan, J. R. 1987. Simplifying decision trees. International Journal of Man-Machine Studies 27, 3, 221--234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Rokach, L. and Maimon, O. 2008. Data mining with decision trees. Theory and applications. Series in machine perception and artificial intelligence v. 69. World Scientific, Singapore. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Sanchez, S. M. 2007. Work Smarter, Not Harder: Guidelines for Designing Simulation Experiments. In Proceedings of the 2007 Winter Simulation Conference. December 9 - 12, 2007, Washington, DC, U.S.A. IEEE, Piscataway, N.J., 84--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Sanchez, S. M. 2011. NOLHdesigns spreadsheet. http://harvest.nps.edu/. Accessed 1 February 2015.Google ScholarGoogle Scholar
  26. Sanchez, S. M. 2014. Simulation Experiments: Better Data, Not Just Big Data. In Proceedings of the 2014 Winter Simulation Conference, 805--816. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Sanchez, S. M. and Wan, H. 2009. Better than a petaflop: The power of efficient experimental design. In Proceedings of the 2009 Winter Simulation Conference (WSC 2009). (WSC 2009) : Austin, Texas : 13--16 December 2009. IEEE Service Center, Piscataway, N.J., 60--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Tang, Z., Xue, Q., Zhao, M., and Wei, Y. 2009. Decision Tree Algorithm for Tank Damage Analysis in Combat Simulation Tests. In 9th International Conference on Electronic Measurement & Instruments (ICEMI 2009), 3--830--3--835.Google ScholarGoogle Scholar
  29. Tercan, H., Al Khawli, T., Eppelt, U., Büscher, C., Meisen, T., and Jeschke, S. 2016. Use of Classification Techniques to Design Laser Cutting Processes. Procedia 5CIRP6 52, 292--297.Google ScholarGoogle Scholar
  30. Tsay, R. S. 2010. Analysis of financial time series. Wiley series in probability and statistics. Wiley, Hoboken, NJ.Google ScholarGoogle Scholar
  31. Vieira, H., Sanchez, S. M., Kienitz, K. H., and Belderrain, M. C. N. 2011. Improved efficient, nearly orthogonal, nearly balanced mixed designs. In Proceedings of the 2011 Winter Simulation Conference (WSC 2011), 3600--3611. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Yoshida, K. 2007. Sampling-Based Stream Mining for Network Risk Management. In New Frontiers in Artificial Intelligence. JSAI 2006 conference and workshops, Tokyo, Japan, June 5--9, 2006 ; revised selected papers. Lecture notes in computer science Lecture notes in artificial intelligence 4384. Springer, Berlin, New York, 374--386. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Zaharia, M., Franklin, M. J., Ghodsi, A., Gonzalez, J., Shenker, S., Stoica, I., Xin, R. S., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X., Rosen, J., and Venkataraman, S. 2016. Apache Spark. Commun. ACM 59, 11, 56--65. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Online Analysis of Simulation Data with Stream-based Data Mining

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGSIM-PADS '17: Proceedings of the 2017 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation
          May 2017
          278 pages
          ISBN:9781450344890
          DOI:10.1145/3064911
          • General Chairs:
          • Wentong Cai,
          • Teo Yong Meng,
          • Program Chairs:
          • Philip Wilsey,
          • Kevin Jin

          Copyright © 2017 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 16 May 2017

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate398of779submissions,51%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader