Elsevier

Computer Networks

Volume 54, Issue 1, 15 January 2010, Pages 165-174
Computer Networks

Receiver-oriented design of Bloom filters for data-centric routing

https://doi.org/10.1016/j.comnet.2009.10.002Get rights and content

Abstract

Bloom filter (BF) is a space-efficient data structure that represents a large set of items and supports efficient membership queries. It has been widely proposed to employ Bloom filters in the routing entries so as to facilitate data-centric routing in network applications. The existing designs of Bloom filters, however, cannot effectively support in-network queries. Given a query for a data item at a node in the network, the noise in unrelated routing entries very likely equals to the useful information of the item in the right routing entries. Consequently, the majority of queries are routed towards many wrong nodes besides those destinations, wasting large quantities of network traffic. To address this issue, we classified the existing designs as CUBF (Cumulative Bloom filters) and ABF (Aggregated Bloom filters), and then evaluate their performance in routing queries under the noisy environments. Based on the evaluation results, we propose a receiver-oriented design of Bloom filters to sufficiently restrict the probability of a wrong routing decision. Moreover, we significantly decrease the delay of a routing decision in the case of CUBF by using the bit slice approach, and reduce the transmission size of each BF in the case of ABF by using the compression approach. Both the theoretical analysis and experimental results demonstrate that our receiver-oriented design of Bloom filters apparently outperforms the existing approaches in terms of the success probability of routing and network traffic cost.

Introduction

Bloom filter (BF) [1] is a space-efficient data structure that represents a large set of items and supports efficient membership queries. BF outperforms other data structures such as binary search trees and tries, as the time needed to insert an item or check whether an item belongs to the filtering set is constant, irrespective of the cardinality of the set. Hence BF has been widely adopted in database and networking applications [1], [2], such as web cache sharing [3] and routing [4], [5], [6]. Moreover, BF has great potential in memory management, such as summarizing streaming data in memory [7], storing the states of a large number of flows in the on-chip memory of the routers [8], and speeding up the Bayesian filters [9].

The space efficiency of BF, however, is achieved at the cost of false positive judgments. False positive judgment is a unnegligible drawback of BF, which refers to the case that an item does not belong to a set but the BF makes the contrary judgment. In many applications, the savings in storage and computational costs brought by BF outweigh such a drawback, on condition that the false positive probability is sufficiently low. Many efforts have been made to reduce the probability of false positive in stand-alone and distributed systems [10], [11], [12], [13] during the past years.

In the last few years, it has been proposed to employ BF in supporting data-centric routing in overlay networks [4], [5], [14], [15], [16], wireless sensor networks [17], ad hoc networks [18], [19], and mesh networks [20]. The common idea of those proposals is that every node uses a BF to represent the information of all its data items and broadcasts the BF to the nodes in its propagation range, i.e. the nodes within d hops from the node. Correspondingly, a node receives a number of BFs via each link of it. The link, associated with the BFs received via it or the union of them, is then maintained as a routing entry. The union of BFs is defined as the logic or operation among their bit vectors [21]. For any intermediate node routing a query, it forwards the query through the link whose corresponding routing entry satisfies the query. Ideally, a query will be propelled towards its destination once it enters the propagation range of the destination. For example, a query at node E for a data item at node A is routed to the right destination node along a single path ECBA, as shown in Fig. 1a.

False positive judgments exist in data-centric routing with BFs. The necessary condition for the BF-based routing schemes outperforming the blind routing schemes is that the false positive probability in the routing entries is sufficiently low. Otherwise, given any query for a data item x at a node in the network, the noise in unrelated routing entries very likely equals to the useful information of the item in the right routing entries. The noise in a unrelated routing entry is defined as the amount of membership information of x in it, if the node does not receive a BF from the destination of x via the corresponding link. Being misled, the majority of queries are routed towards many wrong nodes besides the right destinations, and result in huge amount of redundant but useless queries in the network. For example, a query at node E for a data item at node A is routed to both the right destination node and other two nodes G and H which do not hold the desired data item, as shown in Fig. 1b. The network in turn presents poor efficiency of query processing and suffers scalability problem.

In this paper, we reveal through theoretical analysis that the existing designs of BF cannot satisfy the aforementioned necessary condition. Specifically, we classified the existing BF-based routing schemes as CUBF (Cumulative Bloom filters) and ABF (Aggregated Bloom filters), and then evaluate their performance in routing queries under the noisy environments. Based on the evaluation results, we propose a receiver-oriented design of Bloom filters, with which the false positive probability of any routing entry is low enough so that a node can correctly distinguish the right out-going link from the others. We further conduct extensive simulations to evaluate the performance of the proposed scheme. Both the theoretical analysis and experimental results demonstrate that our receiver-oriented design of BF apparently outperforms the existing designs in terms of success probability of routing and network traffic cost, as shown in Fig. 1.

The rest of this paper is organized as follows. In Section 2, we briefly introduce BF and its traditional design. In Section 3, we summarize the state-of-arts designs of BF-based data-centric routing schemes, and then propose the receiver-oriented design of BF. In Section 4, we further optimize the transmission and storage strategies of BF, and present our study on how to ensure that the false positive probability of any routing entry does not exceed an upper bound in practice. Section 5 presents the performance evaluation results. We concludes this work in Section 6.

Section snippets

Preliminaries

A BF for representing a set X of n items is described by a vector of m bits, initially all set to 0. A BF uses k independent hash functions h1,,hk to map each item of X to a random number over a range {1,,m} [1] uniformly. For each item x of X, we define its BF address as bfaddress(x) consisted of hi(x) for 1ik, and the bits belonging to bfaddress(x) are set to 1 when inserting x. Once the set X is represented as a BF, to judge whether an element x belongs to X, one just needs to check

Receiver-oriented bloom filters

In this section, we first introduce the state-of-arts designs of BF-based data-centric routing schemes, including CUBF and ABF. We then propose a receiver-oriented BF, which is superior to the existing designs with respect to the false positive probability.

Discussion

To further improve our approach, we address two key issues, namely to optimize the transmission size of ABF and storage strategy of CUBF. We then discuss how to ensure that the false positive probability of any routing entry does not exceed an upper bound in practice.

Performance evaluation

In this section, we first evaluate the performance of CUBF and ABF in terms of the optimal number of hash functions, the false positive probability, and the transmission size. We then conduct experiments to examine the false positive probability of CUBF and ABF in practice.

Conclusion

BF-based data-centric routing has been widely used and extensively studied in the field of network applications. In this paper, we study the false positive problem in the traditional designs of BFs in data-centric routing schemes. We disclose that previous data-centric routing schemes using Bloom filters cannot facilitate in-network queries correctly, due to the noise in BF-based routing entries. Based on the evaluation results of previous designs, namely CUBF and ABF, we propose the

Acknowledgments

This work is supported in part by NSF China under Grants Nos. 60903206, 60903225, National Basic Research Program of China (973 Program) under Grants Nos. 2009CB3020402, 61364, and National High Technology Research and Development Program of China (863 Program) under Grant No. 2008AA01Z216.

Deke Guo received the BE degree in industry engineering from Beijing University of Aeronautic and Astronautic, Beijing, China, in 2001, and the PhD degree in management science and engineering from National University of Defense Technology, Changsha, China, in 2008. He was a visiting scholar in the Department of Computer Science and Engineering, Hong Kong University of Science and Technology from January 2007 to January 2009. Currently, he is an assistant professor of Information System and

References (30)

  • W.H. Yuen et al.

    Improving search efficiency using bloom filters in partially connected ad hoc networks: a node-centric analysis

    Computer Communications

    (2007)
  • A. Broder et al.

    Network applications of bloom filters: a survey

    Internet Mathematics

    (2005)
  • J. Mullin

    Optimal semijoins for distributed database systems

    IEEE Transactions on Software Engineering

    (1990)
  • L. Fan et al.

    Summary cache: a scalable wide-area web cache sharing protocol

    IEEE/ACM Transactions on Networking

    (2000)
  • J. Li, J. Taylor, L. Serban, M. Seltzer, Self-organization in peer-to-peer system, in: Proceedings of the 10th ACM...
  • S. Rhea, J. Kubiatowicz, Probabilistic location and routing, in: Proceedings of the IEEE INFOCOM, New York, USA, 2004,...
  • A. Kumar, J. Xu, E. Zegura, Efficient and scalable query routing for unstructured peer-to-peer networks, in:...
  • F. Deng, D. Rafiei, Approximately detecting duplicates for streaming data using stable bloom filters, in: Proceedings...
  • F. Bonomi, M. Mitzenmacher, R. Panigrahy, S. Singh, G. Varghese, Beyond bloom filters: from approximate membership...
  • K. Li, Z. Zhong, Fast statistical spam filter by approximate classifications, in: Proceedings of the...
  • M. Jimeno, K. Christensen, A. Roginsky, A power management proxy with a new best-of-n bloom filter design to reduce...
  • F. Hao, M. Kodialam, T.V. Lakshman, Building high accuracy bloom filters using partitioned hashing, in: Proceedings of...
  • B. Chazelle, J. Kilian, R. Rubinfeld, A. Tal, The bloomier filter: an efficient data structure for static support...
  • R.P. Laufer, P.B. Velloso, O.C.M.B. Duarte, Generalized bloom filters, Tech. Rep. Research Report GTA-05-43, University...
  • T. Hodes et al.

    An architecture for secure wide-area service discovery

    Wireless Networks

    (2002)
  • Cited by (14)

    • Distance-aware bloom filters: Enabling collaborative search for efficient resource discovery

      2013, Future Generation Computer Systems
      Citation Excerpt :

      Ref. [19] discusses the false negative problem of counting bloom filters. Refs. [20,21] respectively discusses how to utilize bloom filters to realize hint-based WSN routing and data-centric routing. Space precludes further discussion.

    • A Generalized Bloom Filter to secure distributed network applications

      2011, Computer Networks
      Citation Excerpt :

      False positives are harmful here because requests may be replicated towards wrong routes or even cause routing loops. Guo et al. [25] realize that false positives quickly increase in this scenario due to the aggregation effect. That is, even if the individual filters from each neighbor have a low false-positive rate, the false positives of the broadcast aggregate filter may be unacceptable.

    • Near-accurate multiset reconciliation

      2019, IEEE Transactions on Knowledge and Data Engineering
    View all citing articles on Scopus

    Deke Guo received the BE degree in industry engineering from Beijing University of Aeronautic and Astronautic, Beijing, China, in 2001, and the PhD degree in management science and engineering from National University of Defense Technology, Changsha, China, in 2008. He was a visiting scholar in the Department of Computer Science and Engineering, Hong Kong University of Science and Technology from January 2007 to January 2009. Currently, he is an assistant professor of Information System and Management, National University of Defense Technology, Changsha, China. His current interests include peer-to-peer computing, pervasive computing, and wireless multi-hop networks. He is a member of the ACM and the IEEE.

    Yuan He received his BE degree in Department of Computer Science and Technology from University of Science and Technology of China in 2003, and his ME degree in Institute of Software, Chinese Academy of Sciences, in 2006. He is now a PhD student in the Department of Computer Science and Engineering at Hong Kong University of Science and Technology, supervised by Dr. Yunhao Liu. His research interests include peer-to-peer computing, sensor networks, and pervasive computing. He is a student member of the IEEE and the IEEE Computer Society.

    Panlong Yang received his BS degree, MS degree, and PhD degree in communication and information system from Nanjing Institute of Communication Engineering, China, in 1999, 2002, and 2005, respectively. During November 2006 to March 2009, he was a postdoc fellow in the Department of Computer Science, Nanjing University. He is now an associate professor in the Nanjing Institute of Communication Engineering. His research interests include wireless mesh networks, wireless sensor networks and cognitive radio networks. He is a member of the IEEE Computer Society and ACM SIGMOBILE Society.

    View full text