Skip to main content
Log in

An intelligent approach to data extraction and task identification for process mining

  • Published:
Information Systems Frontiers Aims and scope Submit manuscript

Abstract

Business process mining has received increasing attention in recent years due to its ability to provide process insights by analyzing event logs generated by various enterprise information systems. A key challenge in business process mining projects is extracting process related data from massive event log databases, which requires rich domain knowledge and advanced database skills and could be very labor-intensive and overwhelming. In this paper, we propose an intelligent approach to data extraction and task identification by leveraging relevant process documents. In particular, we analyze those process documents using text mining techniques and use the results to identify the most relevant database tables for process mining. The novelty of our approach is to formalize data extraction and task identification as a problem of extracting attributes as process components, and relations among process components, using sequence kernel techniques. Our approach can reduce the effort and increase the accuracy of data extraction and task identification for process mining. A business expense imbursement case is used to illustrate our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Aldowaisan, T. A., & Gaafar, L. K. (1999). Business process reengineering: an approach for process mapping. Omega, 27(5), 515–24.

    Article  Google Scholar 

  • Bunescu, R., & Mooney R. (2005). A Shortest path dependency kernel for relation extraction. Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing. (pp. 724–731) Vancouver, B.C, Canada: Association for Computational Linguistics. http://www.aclweb.org/anthology/H05-1091.

  • Bunescu, R., Mooney, R., Weiss, Y., Schölkopf, B., & Platt, J. (2006). Subsequence kernels for relation extraction. Advances in Neural Information Processing Systems, 18, 171–78.

    Google Scholar 

  • Cobb, C.G. (2004). Enterprise process mapping: Integrating systems for compliance and business excellence. {ASQ} Quality Press.

  • Cristianini, N., & Shawe-Taylor J. (2000). An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press.

  • Culotta, A., and J. Sorensen. (2004). Dependency tree kernels for relation extraction. In 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04) (pp. 423–429). Barcelona, Spain.

  • Dennis, A., Wixom, B.H., and Tegarden D. (2004). Systems analysis and design with UML Version 2.0: An Object-Oriented Approach. Wiley.

  • Dietterich, T.G., Becker S., Ghahramani Z., Collins M., and Duffy N. (2002). Convolution kernels for natural language. in Advances in Neural Information Processing Systems 14. MIT.

  • Grigori, D., et al. (2004). Business process intelligence. Computers in Industry, 53, 321–43.

    Article  Google Scholar 

  • Günther, C.W., & van der Aalst, W.M.P. (2007). Fuzzy mining: Adaptive process simplification based on multi-perspective metrics. In G. Alonso, P. Dadam, M. Rosemann (Eds.), Lecture Notes in Computer Science: Vol. 4714. Proceedings of the 5th International Conference on Business Process Management (pp. 328–343). Berlin, Heidelberg: Springer-Verlag. doi:10.1007/978-3-540-75183-0.

  • Hofacker, I., & Vetschera, R. (2001). Algorithmical approaches to business process design. Computers & Operations Research, 28(13), 1253–75.

    Article  Google Scholar 

  • Hunt, V. D. (1996). Process Mapping : How to Reengineer Your Business Processes. Wiley. New York

  • Ingvaldsen, J.E. (2011). Semantic process mining of enterprise transaction data. Norwegian University of Science and Technology.

  • Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the 14th International Joint Conference on Artificial Intelligence 2, pp. 1137–1143. San Francisco, CA, USA: Morgan Kaufmann.

  • Lafferty, J., McCallum A., & Pereira F. (2001). Conditional random FIelds: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning (pp. 282–289). San Francisco. http://portal.acm.org/citation.cfm?id=655813.

  • Li, J., Wang, H. J., Zhang, Z., & Leon Zhao, J. (2010). A policy-based process mining framework: mining business policy texts for discovering process models. Journal of Information Systems and E-Business Management, 8, 169–88.

    Article  Google Scholar 

  • Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., & Watkins, C. (2002). Text classification using string kernels. Journal of Machine Learning Research, 2(3), 419–44.

    Google Scholar 

  • Madison, D. (2005). Process mapping, process improvement and process management. Paton Press.

  • Mans, R. S., Schonenberg, M. H., Song, M., & Bakker, P. J. M. (2009). Application of process mining in healthcare – a case study in a dutch hospital. Biomedical Engineering Systems and Technologies, 25, 425–38.

    Article  Google Scholar 

  • Reijers, H. A., Limam, S., & van der Aalst, W. M. P. (2003). Product-based workflow design. Journal of Management Information Systems, 20(1), 229–62.

    Google Scholar 

  • Rodríguez, C., Engel, R., Kostoska, G., Daniel, F., Casati, F., & Aimar, M. (2012). Eventifier: Extracting process execution logs from operational databases. In Proceedings of the 10th International Conference on Business Process Management. Tallinn, Estonia.

  • Russell, N., van der Aalst W.M.P., ter Hofstede, A. H. M., & Edmond, D. (2005). Workflow resource patterns: Identification, representation and tool support. In Proceedings of the 17th International Conference on Advanced Information Systems Engineering (pp. 216–232). Porto: Portugal.

  • Van der Aalst, W. (2000). Workflow verification: Finding control-flow errors using petri-net-based techniques. Business Process Management 19–128.

  • Van der Aalst, W. M. P. (2012). Process mining: overview and opportunities. ACM Transactions on Management Information Systems (TMIS), 3(2), 7.

    Google Scholar 

  • Van der Aalst, W. M. P., & Weijters, A. (2004). Process mining: a research agenda. Computers in Industry, 53(3), 231–44.

    Article  Google Scholar 

  • Van der Aalst, W. M. P., et al. (2007). Business process mining: an industrial application. Information Systems, 32(1), 713–32.

    Article  Google Scholar 

  • Van der Aalst, W. M. P., Schonenberg, M. H., & Song, M. (2011). Time prediction based on process mining. Information Systems, 36(2), 450–75.

    Article  Google Scholar 

  • Wang, H. J., & Harris, W. (2010). Supporting process design for E-business via an integrated process repository. Information Technology and Management, 12(2), 97–109.

    Article  Google Scholar 

  • WFMC. (1999). Interface 1: Process definition interchange {Q&A} and Examples ({WFMC-TC-1016-X)}, Draft 7.01. Workflow Management Coalition.

  • Zelenko, D., Aone, C., & Richardella, A. (2003). Kernel methods for relation extraction. Journal of Machine Learning Research, 3(6), 1083–1106.

    Google Scholar 

Download references

Acknowledgments

This research was partially supported by a JPMorgan Chase Fellowship from the Institute of Financial Services Analytics at the University of Delaware.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Harry Jiannan Wang.

Appendix A: Sample data for the illustrative case study

Appendix A: Sample data for the illustrative case study

The example tables with data for the database shown in Figure 4 are included below.

Employee Table:

EID

Name

Department

E001

Joe Wang

MIS

E002

Nina Somers

MIS

E003

Nancy Warren

BS

E004

Jeff Jones

MIS

E005

Linda Proctor

BS

E006

Jennifer Brinkley

Procurement

E007

Debra Berry

Procurement

Expense Form Table:

FID

AccountNumber

BankAccount

F001

BUEC001

BOA001

F002

ACCT001

WSFS001

F003

ACCT002

WSFS001

Expense Item Table:

ItemID

Type

Date

Comments

Amount

FID

I001

Car Rental

8/10/06

Car rental from Arizona to Delaware via United Van Lines

$3,540.00

F001

I002

Hotel

7/21/07

Hotel for CSWIM2007

$350.00

F002

I003

Registration

7/21/07

Registration Fee for CSWIM2007

$150.00

F002

I004

Meal

7/22/07

Dinner

$25.00

F002

I005

Miscellaneous

5/22/07

Chair for faculty member's office

$550.00

F003

Routing Table:

RID

Role

Time

Comments

EID

FID

R001

Originator

9/8/06 10:37

New MIS faculty relocation from Arizona to Delaware per agreement letter.

E002

F001

R002

Supervisor

9/8/06 12:47

 

E004

F001

R003

Account Administrator

9/11/06 10:36

 

E003

F001

R004

Approver

9/11/06 11:10

 

E005

F001

R005

Reject

9/15/06 9:48

The charges for the shipment of a vehicle is not applicable for reimbursement under policy 3–11. A copy of the agreement letter is needed and any exception to policy must be signed by the dean and Provost office.

E006

F001

R006

Resubmit

10/10/06 19:11

 

E002

F001

R007

Supervisor

10/11/06 8:25

 

E004

F001

R008

Account Administrator

10/16/06 11:38

 

E003

F001

R009

Approver

10/16/06 12:28

 

E005

F001

R010

Procurement

10/16/06 16:36

 

E006

F001

R011

Originator

8/16/07 11:06

 

E001

F002

R012

Drafter

8/20/07 10:45

 

E002

F002

R013

Supervisor

8/20/07 13:12

 

E004

F002

R014

Account Administrator

8/22/07 10:49

 

E003

F002

R015

Approver

8/22/07 11:18

 

E005

F002

R016

Procurement

8/30/07 14:06

 

E007

F002

R017

Originator

6/15/07 11:44

 

E002

F003

R018

Supervisor

6/15/07 14:31

 

E004

F003

R019

Account Admin

6/18/07 14:52

 

E003

F003

R020

Approver

6/18/07 14:59

 

E005

F003

R021

Procurement

8/20/07 16:55

This BER is approved as an exception however no further BERs for goods will be approved. Please request a card or have someone with a Pro-card handle these transactions

E006

F003

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, J., Wang, H.J. & Bai, X. An intelligent approach to data extraction and task identification for process mining. Inf Syst Front 17, 1195–1208 (2015). https://doi.org/10.1007/s10796-015-9564-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10796-015-9564-3

Keywords

Navigation