Abstract
Business process mining has received increasing attention in recent years due to its ability to provide process insights by analyzing event logs generated by various enterprise information systems. A key challenge in business process mining projects is extracting process related data from massive event log databases, which requires rich domain knowledge and advanced database skills and could be very labor-intensive and overwhelming. In this paper, we propose an intelligent approach to data extraction and task identification by leveraging relevant process documents. In particular, we analyze those process documents using text mining techniques and use the results to identify the most relevant database tables for process mining. The novelty of our approach is to formalize data extraction and task identification as a problem of extracting attributes as process components, and relations among process components, using sequence kernel techniques. Our approach can reduce the effort and increase the accuracy of data extraction and task identification for process mining. A business expense imbursement case is used to illustrate our approach.
Similar content being viewed by others
References
Aldowaisan, T. A., & Gaafar, L. K. (1999). Business process reengineering: an approach for process mapping. Omega, 27(5), 515–24.
Bunescu, R., & Mooney R. (2005). A Shortest path dependency kernel for relation extraction. Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing. (pp. 724–731) Vancouver, B.C, Canada: Association for Computational Linguistics. http://www.aclweb.org/anthology/H05-1091.
Bunescu, R., Mooney, R., Weiss, Y., Schölkopf, B., & Platt, J. (2006). Subsequence kernels for relation extraction. Advances in Neural Information Processing Systems, 18, 171–78.
Cobb, C.G. (2004). Enterprise process mapping: Integrating systems for compliance and business excellence. {ASQ} Quality Press.
Cristianini, N., & Shawe-Taylor J. (2000). An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press.
Culotta, A., and J. Sorensen. (2004). Dependency tree kernels for relation extraction. In 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04) (pp. 423–429). Barcelona, Spain.
Dennis, A., Wixom, B.H., and Tegarden D. (2004). Systems analysis and design with UML Version 2.0: An Object-Oriented Approach. Wiley.
Dietterich, T.G., Becker S., Ghahramani Z., Collins M., and Duffy N. (2002). Convolution kernels for natural language. in Advances in Neural Information Processing Systems 14. MIT.
Grigori, D., et al. (2004). Business process intelligence. Computers in Industry, 53, 321–43.
Günther, C.W., & van der Aalst, W.M.P. (2007). Fuzzy mining: Adaptive process simplification based on multi-perspective metrics. In G. Alonso, P. Dadam, M. Rosemann (Eds.), Lecture Notes in Computer Science: Vol. 4714. Proceedings of the 5th International Conference on Business Process Management (pp. 328–343). Berlin, Heidelberg: Springer-Verlag. doi:10.1007/978-3-540-75183-0.
Hofacker, I., & Vetschera, R. (2001). Algorithmical approaches to business process design. Computers & Operations Research, 28(13), 1253–75.
Hunt, V. D. (1996). Process Mapping : How to Reengineer Your Business Processes. Wiley. New York
Ingvaldsen, J.E. (2011). Semantic process mining of enterprise transaction data. Norwegian University of Science and Technology.
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the 14th International Joint Conference on Artificial Intelligence 2, pp. 1137–1143. San Francisco, CA, USA: Morgan Kaufmann.
Lafferty, J., McCallum A., & Pereira F. (2001). Conditional random FIelds: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning (pp. 282–289). San Francisco. http://portal.acm.org/citation.cfm?id=655813.
Li, J., Wang, H. J., Zhang, Z., & Leon Zhao, J. (2010). A policy-based process mining framework: mining business policy texts for discovering process models. Journal of Information Systems and E-Business Management, 8, 169–88.
Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., & Watkins, C. (2002). Text classification using string kernels. Journal of Machine Learning Research, 2(3), 419–44.
Madison, D. (2005). Process mapping, process improvement and process management. Paton Press.
Mans, R. S., Schonenberg, M. H., Song, M., & Bakker, P. J. M. (2009). Application of process mining in healthcare – a case study in a dutch hospital. Biomedical Engineering Systems and Technologies, 25, 425–38.
Reijers, H. A., Limam, S., & van der Aalst, W. M. P. (2003). Product-based workflow design. Journal of Management Information Systems, 20(1), 229–62.
Rodríguez, C., Engel, R., Kostoska, G., Daniel, F., Casati, F., & Aimar, M. (2012). Eventifier: Extracting process execution logs from operational databases. In Proceedings of the 10th International Conference on Business Process Management. Tallinn, Estonia.
Russell, N., van der Aalst W.M.P., ter Hofstede, A. H. M., & Edmond, D. (2005). Workflow resource patterns: Identification, representation and tool support. In Proceedings of the 17th International Conference on Advanced Information Systems Engineering (pp. 216–232). Porto: Portugal.
Van der Aalst, W. (2000). Workflow verification: Finding control-flow errors using petri-net-based techniques. Business Process Management 19–128.
Van der Aalst, W. M. P. (2012). Process mining: overview and opportunities. ACM Transactions on Management Information Systems (TMIS), 3(2), 7.
Van der Aalst, W. M. P., & Weijters, A. (2004). Process mining: a research agenda. Computers in Industry, 53(3), 231–44.
Van der Aalst, W. M. P., et al. (2007). Business process mining: an industrial application. Information Systems, 32(1), 713–32.
Van der Aalst, W. M. P., Schonenberg, M. H., & Song, M. (2011). Time prediction based on process mining. Information Systems, 36(2), 450–75.
Wang, H. J., & Harris, W. (2010). Supporting process design for E-business via an integrated process repository. Information Technology and Management, 12(2), 97–109.
WFMC. (1999). Interface 1: Process definition interchange {Q&A} and Examples ({WFMC-TC-1016-X)}, Draft 7.01. Workflow Management Coalition.
Zelenko, D., Aone, C., & Richardella, A. (2003). Kernel methods for relation extraction. Journal of Machine Learning Research, 3(6), 1083–1106.
Acknowledgments
This research was partially supported by a JPMorgan Chase Fellowship from the Institute of Financial Services Analytics at the University of Delaware.
Author information
Authors and Affiliations
Corresponding author
Appendix A: Sample data for the illustrative case study
Appendix A: Sample data for the illustrative case study
The example tables with data for the database shown in Figure 4 are included below.
Employee Table:
EID | Name | Department |
E001 | Joe Wang | MIS |
E002 | Nina Somers | MIS |
E003 | Nancy Warren | BS |
E004 | Jeff Jones | MIS |
E005 | Linda Proctor | BS |
E006 | Jennifer Brinkley | Procurement |
E007 | Debra Berry | Procurement |
Expense Form Table:
FID | AccountNumber | BankAccount |
F001 | BUEC001 | BOA001 |
F002 | ACCT001 | WSFS001 |
F003 | ACCT002 | WSFS001 |
Expense Item Table:
ItemID | Type | Date | Comments | Amount | FID |
I001 | Car Rental | 8/10/06 | Car rental from Arizona to Delaware via United Van Lines | $3,540.00 | F001 |
I002 | Hotel | 7/21/07 | Hotel for CSWIM2007 | $350.00 | F002 |
I003 | Registration | 7/21/07 | Registration Fee for CSWIM2007 | $150.00 | F002 |
I004 | Meal | 7/22/07 | Dinner | $25.00 | F002 |
I005 | Miscellaneous | 5/22/07 | Chair for faculty member's office | $550.00 | F003 |
Routing Table:
RID | Role | Time | Comments | EID | FID |
R001 | Originator | 9/8/06 10:37 | New MIS faculty relocation from Arizona to Delaware per agreement letter. | E002 | F001 |
R002 | Supervisor | 9/8/06 12:47 | E004 | F001 | |
R003 | Account Administrator | 9/11/06 10:36 | E003 | F001 | |
R004 | Approver | 9/11/06 11:10 | E005 | F001 | |
R005 | Reject | 9/15/06 9:48 | The charges for the shipment of a vehicle is not applicable for reimbursement under policy 3–11. A copy of the agreement letter is needed and any exception to policy must be signed by the dean and Provost office. | E006 | F001 |
R006 | Resubmit | 10/10/06 19:11 | E002 | F001 | |
R007 | Supervisor | 10/11/06 8:25 | E004 | F001 | |
R008 | Account Administrator | 10/16/06 11:38 | E003 | F001 | |
R009 | Approver | 10/16/06 12:28 | E005 | F001 | |
R010 | Procurement | 10/16/06 16:36 | E006 | F001 | |
R011 | Originator | 8/16/07 11:06 | E001 | F002 | |
R012 | Drafter | 8/20/07 10:45 | E002 | F002 | |
R013 | Supervisor | 8/20/07 13:12 | E004 | F002 | |
R014 | Account Administrator | 8/22/07 10:49 | E003 | F002 | |
R015 | Approver | 8/22/07 11:18 | E005 | F002 | |
R016 | Procurement | 8/30/07 14:06 | E007 | F002 | |
R017 | Originator | 6/15/07 11:44 | E002 | F003 | |
R018 | Supervisor | 6/15/07 14:31 | E004 | F003 | |
R019 | Account Admin | 6/18/07 14:52 | E003 | F003 | |
R020 | Approver | 6/18/07 14:59 | E005 | F003 | |
R021 | Procurement | 8/20/07 16:55 | This BER is approved as an exception however no further BERs for goods will be approved. Please request a card or have someone with a Pro-card handle these transactions | E006 | F003 |
Rights and permissions
About this article
Cite this article
Li, J., Wang, H.J. & Bai, X. An intelligent approach to data extraction and task identification for process mining. Inf Syst Front 17, 1195–1208 (2015). https://doi.org/10.1007/s10796-015-9564-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10796-015-9564-3