Skip to main content

Top-k Queries over Distributed Uncertain Categorical Data

  • Chapter
  • First Online:
  • 153 Accesses

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 12130))

Abstract

Uncertain data arises in many modern applications including sensor networks, data integration, and information extraction. Often this data is distributed and there is a need to do efficient query processing over the data in situ. We focus on answering top-k queries and propose a distributed algorithm TDUD, to efficiently answer top-k queries over distributed uncertain categorical data in queries single round of communication. TDUD uses a distributed index structure composed of local uncertain indexes (LUIs) on local sites and a single global uncertain index (GUI) on a coordinator site. Our algorithm minimizes the amount of communication needed to answer a top-k query by maintaining the mean sum dispersion of the probability distribution on each site. Extensive experiments are conducted to verify the effectiveness and efficiency of the proposed methods in terms of communication costs and response time. We show empirically that TDUD is near-optimal in that it can typically retrieve the top-k query answers by communicating only k tuples in a single round.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    We will call this approach NAIV in our experiments.

  2. 2.

    We will call this approach DUTk in our experiments.

  3. 3.

    To avoid having a null value of the SDM (\(M(S,d)\)=0), V should have at least two different values of the attached probabilities. We note that in practice probability values are different.

  4. 4.

    Actually in this case, we can request the top-k tuples approximately.

  5. 5.

    In the example, we are reporting the real number estimates but our algorithm takes the integer floor of this number. See Eq. 2.

References

  1. AbdulAzeem, Y.M., El-Desouky, A.I., Ali, H.A.: A framework for ranking uncertain distributed database. Data Knowl. Eng. 92, 1–19 (2014)

    Article  Google Scholar 

  2. AbdulAzeem, Y.M., Eldesouky, A.I., Ali, H.A., Salem, M.M.: Ranking distributed database in tuple-level uncertainty. Soft Comput. 19(4), 965–980 (2014). https://doi.org/10.1007/s00500-014-1306-9

    Article  Google Scholar 

  3. Agarwal, P.K., Cheng, S., Tao, Y., Yi, K.: Indexing uncertain data. In: PODS (2009)

    Google Scholar 

  4. Amagata, D., Sasaki, Y., Hara, T., Nishio, S.: Probabilistic nearest neighbor query processing on distributed uncertain data. Distrib. Parallel Databases 34(2), 259–287 (2015). https://doi.org/10.1007/s10619-015-7183-0

    Article  Google Scholar 

  5. Barbará, D., Garcia-Molina, H., Porter, D.: The management of probabilistic data. IEEE Trans. Knowl. Data Eng. 4(5), 487–502 (1992)

    Article  Google Scholar 

  6. Benaissa, A., Benbernou, S., Ouziri, M., Sahri, S.: Indexing uncertain categorical data over distributed environment. In: IFSA-EUSFLAT (2015)

    Google Scholar 

  7. Benaissa, A., Yahmi, M., Jamil, Y.: Framework for managing uncertain distributed categorical data. Int. J. Adv. Comput. Sci. Appl. 8(10), 359 (2017)

    Google Scholar 

  8. Cavallo, R., Pittarelli, M.: The theory of probabilistic databases. In: VLDB (1987)

    Google Scholar 

  9. Cheng, R., Xia, Y., Prabhakar, S., Shah, R., Vitter, J.S.: Efficient indexing methods for probabilistic threshold queries over uncertain data. In: VLDB (2004)

    Google Scholar 

  10. Dalvi, N.N., Suciu, D.: Efficient query evaluation on probabilistic databases. In: VLDB (2004)

    Google Scholar 

  11. Fang, Q., Yang, G.: Efficient top-k query processing algorithms in highly distributed environments. JCP 9(9), 2000–2006 (2014)

    Google Scholar 

  12. Hua, M., Pei, J., Zhang, W., Lin, X.: Ranking queries on uncertain data: a probabilistic threshold approach. In: SIGMOD (2008)

    Google Scholar 

  13. Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40(4), 1–58 (2008)

    Article  Google Scholar 

  14. Jestes, J., Cormode, G., Li, F., Yi, K.: Semantics of ranking queries for probabilistic data. IEEE Trans. Knowl. Data Eng. 23(12), 1903–1917 (2011)

    Article  Google Scholar 

  15. Li, F., Yi, K., Jestes, J.: Ranking distributed probabilistic data. In: SIGMOD Conference (2009)

    Google Scholar 

  16. Li, X., Wang, Y., Li, X., Wang, X., Yu, J.: GDPS: an efficient approach for skyline queries over distributed uncertain data. Big Data Res. 1, 23–36 (2014)

    Article  Google Scholar 

  17. Li, X., Wang, Y., Yu, J.: An efficient scheme for probabilistic skyline queries over distributed uncertain data. Telecommun. Syst. 60(2), 225–237 (2015). https://doi.org/10.1007/s11235-015-0025-6

    Article  Google Scholar 

  18. Singh, S., Mayfield, C., Prabhakar, S., Shah, R., Hambrusch, S.E.: Indexing uncertain categorical data. In: ICDE (2007)

    Google Scholar 

  19. Soliman, M.A., Ilyas, I.F., Chang, K.C.: Top-k query processing in uncertain databases. In: ICDE (2007)

    Google Scholar 

  20. Soliman, M.A., Ilyas, I.F., Chang, K.C.: URank: formulation and efficient evaluation of top-k queries in uncertain databases. In: SIGMOD (2007)

    Google Scholar 

  21. Sun, Y., Yuan, Y., Wang, G.: Top-k query processing over uncertain data indistributed environments. World Wide Web 15, 429–446 (2012). https://doi.org/10.1007/s11280-011-0141-5

    Article  Google Scholar 

  22. Wang, X., Shen, D., Yu, G.: Uncertain top-k query processing in distributedenvironments. Distrib. Parallel Databases 34(4), 567–589 (2016)

    Article  Google Scholar 

  23. Ye, M., Liu, X., Lee, W., Lee, D.L.: Probabilistic top-k query processing in distributed sensor networks. In: ICDE (2010)

    Google Scholar 

  24. Yi, K., Li, F., Kollios, G., Srivastava, D.: Efficient processing of top-k queries in uncertain databases. In: ICDE (2008)

    Google Scholar 

Download references

Acknowledgements

We thank and show our gratitude to Salima Benbernou and Renée Miller for relevant and helpful comments and suggestions that greatly improved the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Soror Sahri .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer-Verlag GmbH Germany, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Benaissa, A., Sahri, S., Ouziri, M. (2020). Top-k Queries over Distributed Uncertain Categorical Data. In: Hameurlain, A., Tjoa, A. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XLIII. Lecture Notes in Computer Science(), vol 12130. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-62199-8_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-62199-8_2

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-62198-1

  • Online ISBN: 978-3-662-62199-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics