Efficient data dissemination using locale covers

https://doi.org/10.1016/j.pmcj.2007.11.001Get rights and content

Abstract

Location-dependent data are central to many emerging applications, ranging from traffic information services to sensor networks. The standard pull- and push-based data dissemination models become unworkable since the data volumes and number of clients are high.

We address this problem using locale covers, a subset of the original set of locations of interest, chosen to include at least one location in a suitably defined neighborhood of any client. Since location-dependent values are highly correlated with location, a query can be answered using a location close to the query point. Typical closeness measures might be Euclidean distance, or a k-nearest neighbor criterion.

We show that location-dependent queries may be answered satisfactorily using locale covers. Our approach is independent of locations and speeds of clients, and is applicable to mobile clients.

We also introduce a nested locale cover scheme that ensures fair access latencies, and allows clients to refine the accuracy of their information over time. We also prove two important results: one regarding the greedy algorithm for sensor covers and the other pertaining to randomized locale covers for k-nearest neighbor queries.

Introduction

The growth in mobile and pervasive computing has led to growing interest in location-specific information services, in which mobile clients seek real-time data relevant to their current location. Unfortunately, a pull-based dissemination model is unworkable, since the typically large number of clients would overwhelm the server with pull requests. Push-based models are better, and broadcasting is already being used for this purpose. For example, XM Traffic & Weather [26] is a satellite-based service to providing real-time traffic flow and incident information, as well as weather data to vehicles. Similarly, Microsoft’s Smart Personal Objects Technology (SPOT) [17] initiative aims to deliver a wide variety of location services, including traffic, and points of interest (gas stations, movie theaters, and so on), through an FM broadcast channel. However, data volumes tend to be large, so even broadcasting [13] must be used cautiously, since it serializes items. A client may have to wait a very long time before data of interest appears in the broadcast.

We briefly discuss some challenges in accommodating mobility in such an environment.

The volume of data may be too high, given the available bandwidth, so that broadcasting all data may cause long broadcast cycles and client waiting time, as well as consume excessive bandwidth. Besides, longer waits require clients to remain in listen mode for longer, wasting battery power.

Tracking each client’s location may require high communication overhead, or even be impossible in applications such the satellite-based service Sirius [20] where bidirectional communication is not available. Consequently, scheduling-based approaches such as [22], [1], [12], [3] are infeasible in this environment, since they assume knowledge of the access patterns or requests of each mobile client.

Say the locations of shopping centers are of interest to a large set of mobile clients, each of whom requires information about the shopping centers in its vicinity. Broadcasting data in arbitrary order causes access latencies to be as long as the full broadcast cycle. In some data dissemination models [1], [22], clients communicate their access and mobility patterns to the server, which then schedules data item broadcasts to achieve the fairness among clients. However, it is impractical to require such communication, due to power limitations at clients, and their large number, or because the server is a satellite [20]. Our scheme allows all clients to obtain approximate but satisfactory data quickly.

Pull allows servers to remain stateless, but this model is inappropriate for our application scenarios. First, the huge number of clients would overload the server with pull requests. Second, for many applications, such as the satellite-based service Sirius [20], bidirectional communication is not available for clients to send pull requests to the server. The push model is better, but requires servers to be stateful, and to keep track of what data is to be sent to which client, and at what time. Servers must also commit resources to initiating and maintaining connections, so that push does not scale well with the number of clients. Our approach allows scalable push.

We propose efficient data-dissemination schemes using the novel concept of locale covers, to address the challenges we have outlined. Given a criterion for closeness, a locale cover includes information about some site close to every client, regardless of its position.

We also extend this idea to define nested locale covers: A nested locale cover is a set of locale covers {L1,,Lm} such that each site appears in some locale cover Li. We give efficient and practical algorithms to find small size locale covers and nested locale covers under different closeness criteria, namely k-nearest neighbor and Euclidean distance. In this scheme the broadcast cycle is divided into several subcycles such that each subcycle guarantees to partially satisfy all clients and at the end of the broadcast the entire set is broadcasted.

In our schemes, the server needs no knowledge of the locations of clients, or their mobility and data-access patterns. The accuracy of the approximation can be controlled by defining the locale appropriately. Dissemination schemes using nested locale covers ensure fair access latencies. Nested locale covers allow clients to refine the accuracy of their information over time. However, it seems from our experiments that the nested locale covers are able to satisfy clients at a performance marginally lower than that of random broadcast. It is yet unclear to us whether all such schemes must suffer from lowered performance, or whether better schemes might exist than the one proposed here.

We make the following contributions.

  • (1)

    We present an O(n2k) algorithm for finding locale cover under k-nearest neighbor closeness criteria (Section 3.2). Previously known algorithms for this problem had O(n3) time complexity. We also give a partitioning based heuristic to improve its scalability (Section 3.3).

  • (2)

    We demonstrate through extensive experiments that the proposed technique is able to obtain locale covers of approximately 2n/k for the dataset used, regardless of distribution (Section 6.2).

  • (3)

    We solve the problem of locale cover under Euclidean closeness criteria by reducing it to a sensor cover problem [10] (Section 4.2.1).

  • (4)

    For the greedy algorithm of [10] we theoretically quantify the relationship between the size of cover and fraction of uncovered region (Section 4.2.2). This result directly characterizes the quality of locale cover obtained through a greedy algorithm.

  • (5)

    We present randomized algorithms to compute nested locale cover.

  • (6)

    We also give probabilistic guarantees on the quality of cover obtained by the proposed algorithm (Section 5).

Our notion of locale cover is very general and is likely to be applicable beyond the domain of data dissemination, for example, in spatio-temporal data mining, and approximate indices.

Fig. 1(a) and (b) depict instances of locale covers when locales are defined in terms of the k-neighbor and Euclidean distance criteria, respectively. In Fig. 1(a), the small circles (dark or transparent) represent a collection of one hundred sites distributed over a normalized unit square region. If a client’s locale is set as its five nearest sites, the dark sites form a locale cover, since they include at least one of the five nearest sites for every point in the region. Fig. 1(b) shows the corresponding solution when locales are defined as disks of fixed size (r=0.125).

The rest of the paper is organized as follows. Section 2 provides a brief overview of concepts developed in this work. Sections 3 , 4 illustrate techniques to obtain locale cover for the k-nearest neighbor and Euclidean distance. In Section 5, we discuss nested locale covers for each of these metrics. Finally, in Section 6, we provide experimental results showing that our techniques are efficient and result in small locale covers.

Section snippets

Our approach: Locale covers

Central to our approach is the notion of locale cover, an idea interesting in its own right.

k-domain locale covers

The k-domain locale cover problem is to find an LX which hits the compass of all k-domains. That is, if F be a family of subsets defined as follows: F={D̂|D is disk,|D̂|=k}, then the hitting set of F is a k-domain locale cover.

We first argue, using results from machine learning theory [23], that there exists small size locale cover regardless of the spatial distribution of the sites. Then we describe our algorithm for efficiently finding small size k-domain locale cover. The algorithm runs in

r-disk locale covers

We give two algorithms to compute r-disk locale covers. The first is similar in flavor to k-domain locale covers, but has O(n3) complexity. The second is more efficient, and is obtained by reducing it to a sensor cover problem [10].

Nested locale covers

Larger locale covers will clearly be needed as the desired accuracy increases. Designing locale covers to meet the requirements of the most demanding client would be wasteful and unfair since average wait time increases with the size of the broadcast cycle. Nested locale covers ensure fairness with respect to access latency. A nested locale cover is a set of locale covers {L1,,Lm} such that each site appears in some locale cover Li. A client receives data about at least one of its neighboring

Performance evaluation

This section presents the experimental results for the proposed technique of locale cover, using both synthetic and real datasets. Synthetic datasets, UN30k and GN30k, have 30,000 sites within a unit square having random and Gaussian distribution respectively.

Real dataset CH represents 7972 road intersections in Chicago. Dataset CA, obtained from [19], contains 3257 traffic stations monitoring traffic speed on the freeways across the state of California. Dataset SH, obtained from [18], contains

Related work

Data dissemination using wireless broadcasting has been extensively studied for location-based services within the research and commercial communities.

Conclusions

We introduce the notion of locale cover, and present several novel formulations and variants of the data dissemination problem for location-dependent data in broadcasting environments. Our schemes choose a small subset of sites that include a site in the neighborhood of all clients, regardless of their number or distribution. This method significantly reduce broadcast bandwidth and access latencies for clients, and scales well with the number of users and sites. Our experiments confirm the

Acknowledgments

We would like to thank Dimitrios Gunopulos for several discussions and useful comments. This work was supported in part by grants from Tata Consultancy Services, Inc., and a matching grant from the MICRO program of the University of California.

Dr. Sandeep Gupta received the B.S. degree in computer science from the Indian Institute of Technology (IIT), Guwahati and the Ph.D. degree in computer science from the University of California, Riverside in 2000 and 2006, respectively. He is currently a postdoctoral fellow at the San Diego SuperComputing Center. His research interest include databases, datamining, algorithms, and computational geometry.

References (30)

  • S. Acharya, R. Alonso, M. Franklin, S. Zdonik, Broadcast disks: Data management for asymmetric communication...
  • S. Ahmadi et al.

    Greedy random adaptive memory programming search for the capacitated clustering problem

    Eur. J. Oper. Res.

    (2003)
  • D. Aksoy, M. Franklin, Scheduling for large-scale on-demand data broadcasting, in: Proceedings of IEEE INFOCOM, 1998,...
  • N. Alon et al.

    The Probabilistic Method

    (1992)
  • T. Asano et al.

    Clustering algorithms based on minimum and maximum spanning trees

  • F. Aurenhammer et al.

    A simple on-line randomized incremental algorithms for computing higher order voronoi diagrams

  • J.-D. Boissonnat et al.

    A semi-dynamic construction of higher order voronoi diagrams and its randomized analysis

    Algorithmica

    (1993)
  • H. Bronnimann et al.

    Almost optimal set covers in finite vc-dimension: (Preliminary version)

  • J. Flototto, M. Yvinec, Order-k voronoi diagrams, in: Proceedings of 17th European Workshop on Computational Geometry,...
  • H. Gupta, S.R. Das, Q. Gu, Connected sensor cover: Self-organization of sensor networks for efficient query execution,...
  • D. Haussler et al.

    Epsilon-nets and simplex range queries

    Discrete Comput. Geom.

    (1987)
  • Q. Hu, D.-L. Lee, W.-C. Lee, Dynamic data delivery in wireless communications environments, in: Proceedings of Workshop...
  • T. Imielinski et al.

    Data on air: Organization and access

    IEEE TKDE

    (1997)
  • M.E.J. Leutenegger, S.T. Lopez, Str: A simple and efficient algorithm for r-tree packing, in: Proceedings of the 13th...
  • J. Matousek
    (2002)
  • Cited by (0)

    Dr. Sandeep Gupta received the B.S. degree in computer science from the Indian Institute of Technology (IIT), Guwahati and the Ph.D. degree in computer science from the University of California, Riverside in 2000 and 2006, respectively. He is currently a postdoctoral fellow at the San Diego SuperComputing Center. His research interest include databases, datamining, algorithms, and computational geometry.

    Dr. Jinfeng Ni is a staff software engineer in IBM’s Silicon Valley Laboratory. He holds a Bachelor and Master degree in computer science from the University of Science and Technology of China, and a Ph.D. in Computer Science from the University of California, Riverside. His research interests include spatio-temporal databases, XML and security.

    Chinya V. Ravishankar is a Professor of Computer Science and Associate Dean in the Bourns College of Engineering at the University of California, Riverside. Between 1986–1999, he was on the Faculty of the Electrical Engineering and Computer Science Department at the University of Michigan–Ann Arbor.

    Prof. Ravishankar’s research is currently in the areas of Databases, Networking, and Security. He holds an undergraduate degree in Chemical Engineering from the Indian Institute of Technology, Bombay, and a Ph.D. in Computer Science from the University of Wisconsin — Madison. He is a Senior Member of the Institute of Electrical and Electronics Engineers, and a member of the Association for Computing Machinery.

    This paper is an extended version of [S. Gupta, J. Ni, C.V. Ravishankar, Efficient data dissemination using locale covers, in: Proceedings of the 2005 ACM CIKM International Conference on Information and Knowledge Management, Bremen, Germany, 2005].

    View full text