skip to main content
10.1145/2588555.2588570acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Matching heterogeneous event data

Published: 18 June 2014 Publication History

Abstract

Identifying duplicate events are essential to various business process applications such as provenance querying or process mining. Distinct features of heterogeneous events including opaque names, dislocated traces and composite events, prevent existing data integration from techniques performing well. To address these issues, in this paper, we propose an event similarity function by iteratively evaluating similar neighbors. We prove the convergence of iterative similarity computation, and propose several pruning and estimation methods. To efficiently support matching composite events, we devise upper bounds of event similarities. Experiments on real and synthetic datasets demonstrate that the proposed event matching approaches can achieve significantly higher accuracy than the state-of-the-art matching methods.

References

[1]
Z. Bao, S. B. Davidson, and T. Milo. Labeling work flow views with fine-grained dependencies. PVLDB, 5(11):1208--1219, 2012.
[2]
O. Biton, S. C. Boulakia, S. B. Davidson, and C. S. Hara. Querying and managing provenance through user views in scientific work flows. In ICDE, pages 1072--1081, 2008.
[3]
F. Casati, M. Castellanos, U. Dayal, and N. Salazar. A generic solution for warehousing business process data. In VLDB, pages 1128--1137, 2007.
[4]
F. Casati, M. Castellanos, N. Salazar, and U. Dayal. Abstract process data warehousing. In ICDE, pages 1387--1389, 2007.
[5]
R. M. Dijkman, M. Dumas, and L. García-Banuelos. Graph matching algorithms for business process model similarity search. In BPM, pages 48--63, 2009.
[6]
L. Ding, S. Chen, E. A. Rundensteiner, J. Tatemura, W.-P. Hsiung, and K. S. Candan. Runtime semantic query optimization for event stream processing. In ICDE, pages 676--685, 2008.
[7]
A. Doan, A. Halevy, and Z. Ives. Principles of data integration. Morgan Kaufmann, 2012.
[8]
D. R. Ferreira, D. Gillblad, and D. Gillblad. Discovering process models from unlabelled event logs. In BPM, pages 143--158, 2009.
[9]
L. Gravano, P. G. Ipeirotis, N. Koudas, and D. Srivastava. Text joins in an rdbms for web data integration. In WWW, pages 90--101, 2003.
[10]
G. Jeh and J. Widom. Simrank: a measure of structural-context similarity. In KDD, pages 538--543, 2002.
[11]
J. Kang and J. F. Naughton. On schema matching with opaque column names and data values. In SIGMOD Conference, pages 205--216, 2003.
[12]
R. M. Karp. Reducibility among combinatorial problems. In R. E. Miller and J. W. Thatcher, editors, Complexity of Computer Computations, pages 85--103. Plenum Press, 1972.
[13]
V. I. Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady, 10(8):707--710, 1966.
[14]
S. Melnik, H. Garcia-Molina, E. Rahm, and E. Rahm. Similarity flooding: A versatile graph matching algorithm and its application to schema matching. In ICDE, pages 117--128, 2002.
[15]
J. Mendling, J. Recker, M. Rosemann, and W. M. P. van der Aalst. Generating correct epcs from configured c-epcs. In SAC, pages 1505--1510, 2006.
[16]
J. Mendling, H. A. Reijers, and W. M. P. van der Aalst. Seven process modeling guidelines (7pmg). Information & Software Technology, 52(2):127--136, 2010.
[17]
J. Munkres. Algorithms for the assignment and transportation problems. Journal of the Society for Industrial & Applied Mathematics, 5(1):32--38, 1957.
[18]
J. Nakatumba, M. Westergaard, and W. M. P. van der Aalst. Generating event logs with workload-dependent speeds from simulation models. In CAiSE Workshops, pages 383--397, 2012.
[19]
S. Nejati, M. Sabetzadeh, M. Chechik, S. M. Easterbrook, and P. Zave. Matching and merging of statecharts specifications. In ICSE, pages 54--64, 2007.
[20]
T. Pedersen, S. Patwardhan, and J. Michelizzi. Wordnet: : Similarity - measuring the relatedness of concepts. In AAAI, pages 1024--1025, 2004.
[21]
E. Rahm and P. A. Bernstein. A survey of approaches to automatic schema matching. VLDB J., 10(4):334--350, 2001.
[22]
J. Wang, T. Jin, R. Wong, and L. Wen. Querying business process model repositories - a survey of current approaches and issues. World Wide Web, 17(3):427--454, 2014.
[23]
M. Weidlich, R. M. Dijkman, and J. Mendling. The icop framework: Identification of correspondences between process models. In CAiSE, pages 483--498, 2010.

Cited By

View all
  • (2022)Fuzzy RDF ModelingModeling and Management of Fuzzy Semantic RDF Data10.1007/978-3-031-11669-8_3(71-107)Online publication date: 9-Sep-2022
  • (2021) ASSEMBLE: A ttribute, S tructure and S emantics Based S e rvice M apping Approach for Collaborative B usiness Process Deve l opm e nt IEEE Transactions on Services Computing10.1109/TSC.2018.280534614:2(371-385)Online publication date: 1-Mar-2021
  • (2020)Effective and efficient retrieval of structured entitiesProceedings of the VLDB Endowment10.14778/3380750.338075413:6(826-839)Online publication date: 11-Mar-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data
June 2014
1645 pages
ISBN:9781450323765
DOI:10.1145/2588555
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. event matching
  2. schema matching

Qualifiers

  • Research-article

Conference

SIGMOD/PODS'14
Sponsor:

Acceptance Rates

SIGMOD '14 Paper Acceptance Rate 107 of 421 submissions, 25%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)2
Reflects downloads up to 14 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Fuzzy RDF ModelingModeling and Management of Fuzzy Semantic RDF Data10.1007/978-3-031-11669-8_3(71-107)Online publication date: 9-Sep-2022
  • (2021) ASSEMBLE: A ttribute, S tructure and S emantics Based S e rvice M apping Approach for Collaborative B usiness Process Deve l opm e nt IEEE Transactions on Services Computing10.1109/TSC.2018.280534614:2(371-385)Online publication date: 1-Mar-2021
  • (2020)Effective and efficient retrieval of structured entitiesProceedings of the VLDB Endowment10.14778/3380750.338075413:6(826-839)Online publication date: 11-Mar-2020
  • (2020)Representing Temporal Attributes for Schema MatchingProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3394486.3403115(709-719)Online publication date: 23-Aug-2020
  • (2020)Approximate Event Pattern Matching over Heterogeneous and Dirty SourcesProceedings of the 29th ACM International Conference on Information & Knowledge Management10.1145/3340531.3418506(3237-3240)Online publication date: 19-Oct-2020
  • (2020)Sampling for Big Data Profiling: A SurveyIEEE Access10.1109/ACCESS.2020.29881208(72713-72726)Online publication date: 2020
  • (2020)A Survey of Approximate Quantile Computation on Large-Scale DataIEEE Access10.1109/ACCESS.2020.29749198(34585-34597)Online publication date: 2020
  • (2019)A Fuzzy RDF Graph-Matching Method Based on Neighborhood SimilarityEmerging Technologies and Applications in Data Processing and Management10.4018/978-1-5225-8446-9.ch009(184-198)Online publication date: 2019
  • (2019)Measuring Business Process Consistency Across Different Abstraction LevelsIEEE Transactions on Network and Service Management10.1109/TNSM.2018.288336216:1(294-307)Online publication date: Mar-2019
  • (2018)Engineering management for high-end equipment intelligent manufacturingFrontiers of Engineering Management10.15302/J-FEM-20180505:4(420)Online publication date: 2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media