skip to main content
research-article

Decentralized Fault-Tolerant Event Correlation

Published: 07 August 2014 Publication History

Abstract

Despite the prognosed use of event correlation techniques for monitoring critical complex infrastructures or dealing with disasters in the physical world, little work exists on making event correlation systems themselves tolerant to failure. Existing systems either provide no guarantees on event deliveries, do not support multicast and thus provide no guarantees across individual processes, or then rely on centralized components or strong assumptions on the infrastructure.
The FAIDECS system attempts to reconcile strong guarantees with practical performance in the presence of process crash failures. To that end, the FAIDECS system uses an overlay network with specific guarantees aligned with its proposed correlation language and guarantees. However, the language proposed lacks expressivity, and the system itself supports only very specific rigid semantics, incapable of supporting even fundamental features like sliding windows.
After providing a comprehensive overview of the FAIDECS model and system, this article bridges the gap between strong guarantees and more established correlation languages and systems in several steps. First, we propose alternative semantics for several modules of the FAIDECS matching engine and revisit guarantees. Second, we pinpoint which guarantees are contradicted by which combinations of semantic options. Third, we investigate four correlation languages—StreamSQL, EQL, CEL, and TESLA—showing which semantic options their respective features correspond to in our model, and thus, ultimately, which guarantees of FAIDECS are maintained by which language features.

References

[1]
Daniel J. Abadi, Don Carney, Ugur Çetintemel, Mitch Cherniack, Christian Convey, Sangdon Lee, Michael Stonebraker, Nesime Tatbul, and Stan Zdonik. 2003. Aurora: A new model and architecture for data stream management. VLDB J. 12, 2 (Aug. 2003), 120--139.
[2]
Marcos K. Aguilera, Robert E. Strom, Daniel C. Sturman, Mark Astley, and Tushar D. Chandra. 1999. Matching events in a content-based subscription system. In Proceedings of the 18th Annual ACM Symposium on Principles of Distributed Computing (PODC '99). ACM, New York, NY, 53--61.
[3]
Marcos Kawazoe Aguilera and Sam Toueg. 1996. Randomization and failure detection: A hybrid approach to solve consensus. In Proceedings of the 10th International Workshop on Distributed Algorithms (WDAG'96). Lecture Notes in Computer Science, vol. 1151, Springer-Verlag, Berlin, Heidelberg, 29--39. http://dl.acm.org/citation.cfm?id=645953.675629
[4]
Magdalena Balazinska, Hari Balakrishnan, Samuel R. Madden, and Michael Stonebraker. 2008. Fault-tolerance in the Borealis distributed stream processing system. ACM Trans. Data. Syst. 33, 1, Article 3 (2008).
[5]
Roberto Baldoni, Silvia Bonomi, Marco Platania, and Leonardo Querzoni. 2012. Dynamic message ordering for topic-based publish/subscribe systems. In Proceedings of the IEEE 26th International Parallel and Distributed Processing Symposium (IPDPS'12). IEEE Computer Society, 909--920.
[6]
Anindya Basu, Bernadette Charron-Bost, and Sam Toueg. 1996. Simulating reliable links with unreliable links in the presence of process crashes. In Proceedings of the 10th International Workshop on Distributed Algorithms (WDAG'96). Lecture Notes in Computer Science, vol. 1151. Springer-Verlag, Berlin, Heidelberg, 105--122. http://dl.acm.org/citation.cfm?id=645953.675641
[7]
Lars Brenna, Alan Demers, Johannes Gehrke, Mingsheng Hong, Joel Ossher, Biswanath Panda, Mirek Riedewald, Mohit Thatte, and Walker White. 2007. Cayuga: A high-performance event processing engine. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'07). ACM, New York, NY, 1100--1102.
[8]
Antonio Carzaniga, David S. Rosenblum, and Alexander L. Wolf. 2001. Design and evaluation of a wide-area event notification service. ACM Trans. Comput. Syst. 19, 3 (2001), 332--383.
[9]
Sharma Chakravarthy, V. Krishnaprasad, Eman Anwar, and S.-K. Kim. 1994. Composite events for active databases: Semantics, contexts and detection. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB'94). Morgan Kaufmann Publishers Inc., San Francisco, CA, 606--617. http://dl.acm.org/citation.cfm?id=645920.672994
[10]
Tushar Deepak Chandra and Sam Toueg. 1996. Unreliable failure detectors for reliable distributed systems. J. ACM 43, 2 (1996), 225--267.
[11]
Gianpaolo Cugola and Alessandro Margara. 2010. TESLA: A formally defined event specification language. In Proceedings of the 4th ACM International Conference on Distributed Event-Based Systems (DEBS'10). ACM, New York, NY, 50--61.
[12]
Xavier Défago, André Schiper, and Péter Urbán. 2004. Total order broadcast and multicast algorithms: Taxonomy and survey. ACM Comput. Surv. 36, 4 (2004), 372--421.
[13]
Alan Demers, Johannes Gehrke, Mingsheng Hong, Mirek Riedewald, and Walker White. 2006. Towards expressive publish/subscribe systems. In Proceedings of the 10th International Conference on Advances in Database Technology (EDBT'06). Lecture Notes in Computer Science, vol. 1151, Springer-Verlag, Berlin, Heidelberg, 627--644.
[14]
Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson. 1985. Impossibility of distributed consensus with one faulty process. J. ACM 32, 2 (1985), 374--382.
[15]
S. Gatziu and K. R. Dittrich. 1994. Detecting composite events in active database systems using Petri nets. In Proceedings of the 4th International Workshop on Research Issues in Data Engineering. Active Database Systems. 2--9.
[16]
Narain H. Gehani, H. V. Jagadish, and Oded Shmueli. 1992. Composite event specification in active databases: Model &Amp; implementation. In Proceedings of the 18th International Conference on Very Large Data Bases (VLDB'92). Morgan Kaufmann Publishers Inc., San Francisco, CA, 327--338. http://dl.acm.org/citation.cfm?id=645918.672484
[17]
Vassos Hadzilacos and Sam Toueg. 1993. Fault-tolerant broadcasts and related problems. Distributed Systems (2nd Ed.) ACM Press/Addison-Wesley Publishing Co., New York, NY. 97--145. http://dl.acm.org/citation.cfm?id=302430.302435
[18]
Waldemar Hummer, Christian Inzinger, Philipp Leitner, Benjamin Satzger, and Schahram Dustdar. 2012. Deriving a unified fault taxonomy for event-based systems. In Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems (DEBS'12). ACM, New York, NY, 167--178.
[19]
Gabriela Jacques-Silva, Jim Challenger, Lou Degenaro, James Giles, and Rohit Wagle. 2007. Towards autonomic fault recovery in System-S. In Proceedings of the 4th International Conference on Autonomic Computing (ICAC'07). IEEE Computer Society, 31--.
[20]
Namit Jain, Shailendra Mishra, Anand Srinivasan, Johannes Gehrke, Jennifer Widom, Hari Balakrishnan, Uǧur Çetintemel, Mitch Cherniack, Richard Tibbetts, and Stan Zdonik. 2008. Towards a streaming SQL standard. Proc. VLDB Endow. 1, 2 (2008), 1379--1390.
[21]
Ramana Rao Kompella, Jennifer Yates, Albert Greenberg, and Alex C. Snoeren. 2005. IP fault localization via risk modeling. In Proceedings of the 2nd Conference on Symposium on Networked Systems Design & Implementation (NSDI'05). USENIX Association, Berkeley, CA, USA, 57--70. http://dl.acm.org/citation.cfm??id=1251203.1251208
[22]
Christopher Krügel, Thomas Toth, and Clemens Kerer. 2002. Decentralized event correlation for intrusion detection. In Proceedings of the 4th International Conference Seoul on Information Security and Cryptology (ICISC'01). Lecture Notes in Computer Science, vol. 1151, Springer-Verlag, Berlin, Heidelberg, 114--131. http://dl.acm.org/citation.cfm?id=646283.687988
[23]
Guoli Li and Hans-Arno Jacobsen. 2005. Composite subscriptions in content-based publish/subscribe systems. In Proceedings of the ACM/IFIP/USENIX International Conference on Middleware (Middleware'05). Lecture Notes in Computer Science, vol. 1151, Springer-Verlag, Berlin, Heidelberg, 249--269. http://dl.acm.org/citation.cfm?id=1515890.1515903
[24]
Cristian Lumezanu, Neil Spring, and Bobby Bhattacharjee. 2006. Decentralized message ordering for publish/subscribe systems. In Proceedings of the ACM/IFIP/USENIX International Conference on Middleware (Middleware'06). Lecture Notes in Computer Science, vol. 1151, Springer-Verlag, Berlin, Heidelberg, 162--179. http://dl.acm.org/citation.cfm?id=1515984.1515997
[25]
Peter R. Pietzuch, Brian Shand, and Jean Bacon. 2003. A framework for event composition in distributed systems. In Proceedings of the ACM/IFIP/USENIX International Conference on Middleware (Middleware'03). Lecture Notes in Computer Science, vol. 1151, Springer-Verlag, Berlin, Heidelberg, 62--82. http://dl.acm.org/citation.cfm?id=1515915.1515921
[26]
Zhengping Qian, Yong He, Chunzhi Su, Zhuojie Wu, Hongyu Zhu, Taizhi Zhang, Lidong Zhou, Yuan Yu, and Zheng Zhang. 2013. TimeStream: Reliable stream computation in the Cloud. In Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys'13). ACM, New York, NY, 1--14.
[27]
Heiko Sturzrehm, Pascal Felber, and Christof Fetzer. 2009. TM-Stream: An STM framework for distributed event stream processing. In Proceedings of the IEEE International Symposium on Parallel Distributed Processing, (IPDPS'09). 1--8.
[28]
P. Triantafillou and A. Economides. 2004. Subscription summarization: A new paradigm for efficient publish/subscribe systems. In Proceedings of the 24th International Conference on Distributed Computing Systems. 562--571.
[29]
Gregory Aaron Wilkin and Patrick Eugster. 2013. Multicasting in the presence of aggregated deliveries. J. Parallel Distrib. Comput. 73, 4 (2013), 544--556.
[30]
Gregory Aaron Wilkin, Patrick Eugster, and K. R. Jayaram. 2014. Decentralized fault tolerant event-correlation. Technical Report. http://www.jayaramkr.com/files/FAIDECSTechReport.pdf.
[31]
Gregory Aaron Wilkin, K. R. Jayaram, Patrick Eugster, and Ankur Khetrapal. 2011. FAIDECS: Fair decentralized event correlation. In Proceedings of the 12th ACM/IFIP/USENIX International Conference on Middleware (Middleware'11). Lecture Notes in Computer Science, vol. 1151, Springer-Verlag, Berlin, Heidelberg, 228--248.
[32]
Kaiwen Zhang, Vinod Muthusamy, and Hans-Arno Jacobsen. 2012. Total order in content-based publish/subscribe systems. In Proceedings of the IEEE 32nd International Conference on Distributed Computing Systems (ICDCS). 335--344.
[33]
Yuanyuan Zhao and Rob Strom. 2001. Exploitng event stream interpretation in publish-subscribe systems. In Proceedings of the 20th Annual ACM Symposium on Principles of Distributed Computing (PODC'01). ACM, New York, NY, 219--228.

Cited By

View all
  • (2019)Complex event recognition in the Big Data era: a surveyThe VLDB Journal10.1007/s00778-019-00557-wOnline publication date: 25-Jul-2019
  • (2015)Modeling the Process of Event Sequence Data Generated for Working Condition DiagnosisMathematical Problems in Engineering10.1155/2015/6934502015(1-13)Online publication date: 2015

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Internet Technology
ACM Transactions on Internet Technology  Volume 14, Issue 1
Special Issue on Event Recognition
July 2014
161 pages
ISSN:1533-5399
EISSN:1557-6051
DOI:10.1145/2659232
  • Editor:
  • Munindar P. Singh
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 August 2014
Accepted: 01 April 2014
Revised: 01 March 2014
Received: 01 October 2013
Published in TOIT Volume 14, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Event
  2. agreement
  3. correlation
  4. fault tolerance
  5. guarantee
  6. order

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)1
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2019)Complex event recognition in the Big Data era: a surveyThe VLDB Journal10.1007/s00778-019-00557-wOnline publication date: 25-Jul-2019
  • (2015)Modeling the Process of Event Sequence Data Generated for Working Condition DiagnosisMathematical Problems in Engineering10.1155/2015/6934502015(1-13)Online publication date: 2015

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media