Skip to main content

Khanan: Performance Comparison and Programming \(\alpha \)-Miner Algorithm in Column-Oriented and Relational Database Query Languages

  • Conference paper
  • First Online:
Big Data Analytics (BDA 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9498))

Included in the following conference series:

  • 1798 Accesses

Abstract

Process-Aware Information Systems (PAIS) support business processes and generate large amounts of event logs from the execution of business processes. An event log is represented as a tuple of CaseID, Timestamp, Activity and Actor. Process Mining is a new and emerging field that aims at analyzing the event logs to discover, enhance and improve business processes and check conformance between run time and design time business processes. The large volume of event logs generated are stored in the databases. Relational databases perform well for a certain class of applications. However, there is a certain class of applications for which relational databases are not able to scale well. To address the challenges of scalability, NoSQL database systems emerged. Discovering a process model (workflow) from event logs is one of the most challenging and important Process Mining tasks. The \(\alpha \)-miner algorithm is one of the first and most widely used Process Discovery techniques. Our objective is to investigate which of the databases (Relational or NoSQL) performs better for a Process Discovery application under Process Mining. We implement the \(\alpha \)-miner algorithm on relational (row-oriented) and NoSQL (column-oriented) databases in database query languages so that our application is tightly coupled to the database. We conduct a performance benchmarking and comparison of the \(\alpha \)-miner algorithm on row-oriented database and NoSQL column-oriented database. We present the comparison on various aspects like time taken to load large datasets, disk usage, stepwise execution time and compression technique.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://cassandra.apache.org/.

  2. 2.

    http://www.mysql.com/.

  3. 3.

    http://dev.mysql.com/doc/refman/5.5/en/innodb-storage-engine.html.

  4. 4.

    http://bit.ly/1C3JgIx.

  5. 5.

    http://dev.mysql.com/doc/refman/5.1/en/create-table.html.

  6. 6.

    http://dev.mysql.com/doc/refman/5.1/en/load-data.html.

  7. 7.

    http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_group-concat.

  8. 8.

    http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_concat-ws.

  9. 9.

    http://www.win.tue.nl/bpi/2013/challenge.

References

  1. Carlos, O.: Programming the k-means clustering algorithm in SQL. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 823–828 (2004)

    Google Scholar 

  2. Ordonez, C., Cereghini, P.: SQLEM: fast clustering in SQL using the EM Algorithm. In: International Conference on Management of Data, pp. 559–570 (2000)

    Google Scholar 

  3. Abadi, D.J., Madden, S.R., Hachem, N.: Column-stores vs. row-stores: how different are they really? In: SIGMOID (2008)

    Google Scholar 

  4. Rana, D.P., Mistry, N.J., Raghuwanshi, M.M.: Association rule mining analyzation using column oriented database. Int. J. Adv. Comput. Res. 3(3), 88–93 (2013)

    Google Scholar 

  5. Finn, M.A.: Fighting impedance mismatch at the database level. White paper (2001)

    Google Scholar 

  6. Gupta, K., Sachdev, A., Sureka, A.: Pragamana: performance comparison and programming alpha-miner algorithm in relational database query language and NoSQL column-oriented using apache phoenix. In: Proceedings of the Eighth International C* Conference on Computer Science & Software Engineering, C3S2E 2015, pp. 113–118 (2008)

    Google Scholar 

  7. Joishi, J., Sureka, A.: Vishleshan: performance comparison and programming process mining algorithms in graph-oriented and relational database query languages. In: Proceedings of the 19th International Database Engineering and Applications Symposium, IDEAS 2015, pp. 192–197 (2014)

    Google Scholar 

  8. Sattler, K.-U., Dunemann, O.: SQL database primitives for decision tree classifiers. In: Conference on Information and Knowledge Management, pp. 379–386 (2001)

    Google Scholar 

  9. Suresh, L., Simha, J., Velur, R.: Implementing k-means algorithm using row store and column store databases-a case study. Int. J. Recent Trends Eng. 4(2) (2009)

    Google Scholar 

  10. Plattner, H.: A common database approach for OLTP and OLAP using an in-memory column database. In: ACM SIGMOD International Conference on Management of Data (2009)

    Google Scholar 

  11. Russell, N.C.: Foundation of process-aware information systems. Dissertation (2007)

    Google Scholar 

  12. Sharma, V., Dave, M.: SQL and NoSQL database. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 2(8), 20–27 (2012)

    Google Scholar 

  13. Weerapong, S., Porouhan, P., Premchaiswadi, W.: Process mining using \(\alpha \)-algorithm as a tool. IEEE (2012)

    Google Scholar 

  14. Aalst, W.V.D.: Process mining: overview and opportunities. ACM Trans. Manage. Inf. Syst. 3(2), 1–17 (2012)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ashish Sureka .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Sachdev, A., Gupta, K., Sureka, A. (2015). Khanan: Performance Comparison and Programming \(\alpha \)-Miner Algorithm in Column-Oriented and Relational Database Query Languages. In: Kumar, N., Bhatnagar, V. (eds) Big Data Analytics. BDA 2015. Lecture Notes in Computer Science(), vol 9498. Springer, Cham. https://doi.org/10.1007/978-3-319-27057-9_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-27057-9_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-27056-2

  • Online ISBN: 978-3-319-27057-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics