Skip to main content

Skyline Queries over Incomplete Data - Error Models for Focused Crowd-Sourcing

  • Conference paper
Book cover Conceptual Modeling (ER 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8217))

Included in the following conference series:

Abstract

Skyline queries are a well-known technique for explorative retrieval, multi-objective optimization problems, and personalization tasks in databases. They are widely acclaimed for their intuitive query formulation mechanisms. However, when operating on incomplete datasets, skyline query processing is severely hampered and often has to resort to error-prone heuristics. Unfortunately, incomplete datasets are a frequent phenomenon due to widespread use of automated information extraction and aggregation. In this paper, we evaluate and compare various established heuristics for adapting skylines to incomplete datasets, focusing specifically on the error they impose on the skyline result. Building upon these results, we argue for improving the skyline result quality by employing crowd-enabled databases. This allows dynamic outsourcing of some database operators to human workers, therefore enabling the elicitation of missing values during runtime. Unfortunately, each crowd-sourcing operation will result in monetary and query runtime costs. Therefore, our main contribution is introducing a sophisticated error model, allowing us to specifically concentrate on those tuples that are highly likely to be error-prone, while relying on established heuristics for safer tuples. This technique of focused crowd-sourcing allows us to strike a perfect balance between costs and result’s quality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Franklin, M., Kossmann, D., Kraska, T., Ramesh, S., Xin, R.: CrowdDB: Answering queries with crowdsourcing. In: ACM SIGMOD Int. Conf. on Management of Data, Athens, Greece (2011)

    Google Scholar 

  2. Khalefa, M.E., Mokbel, M.F., Levandoski, J.J.: Skyline Query Processing for Incomplete Data. In: Int. Conf. on Data Engineering (ICDE), Cancun, Mexico (2008)

    Google Scholar 

  3. Börzsönyi, S., Kossmann, D., Stocker, K.: The Skyline Operator. In: Int. Conf. on Data Engineering (ICDE), Heidelberg, Germany (2001)

    Google Scholar 

  4. Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: Symposium on Principles of Database Systems (PODS), Santa-Barbara, California, USA (2001)

    Google Scholar 

  5. Godfrey, P., Shipley, R., Gryz, J.: Algorithms and analyses for maximal vector computation. The VLDB Journal 16, 5–28 (2007)

    Article  Google Scholar 

  6. Bartolini, I., Ciaccia, P., Patella, M.: Efficient sort-based skyline evaluation. ACM Transactions on Database Systems 33 (2008)

    Google Scholar 

  7. Papadias, D., Tao, Y., Fu, G., Seeger, B.: Progressive skyline computation in database systems. ACM Trans. Database Syst. 30, 41–82 (2005)

    Article  Google Scholar 

  8. Selke, J., Lofi, C., Balke, W.-T.: Highly Scalable Multiprocessing Algorithms for Preference-Based Database Retrieval. In: Kitagawa, H., Ishikawa, Y., Li, Q., Watanabe, C. (eds.) DASFAA 2010. LNCS, vol. 5982, pp. 246–260. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  9. Torlone, R., Ciaccia, P.: Finding the best when it‘s a matter of preference. In: 10th Italian Symposium on Advanced Database Systems (SEBD), Portoferraio, Italy (2002)

    Google Scholar 

  10. Boldi, P., Chierichetti, F., Vigna, S.: Pictures from Mongolia: Extracting the top elements from a partially ordered set. Theory of Computing Systems 44, 269–288 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  11. Park, S., Kim, T., Park, J., Kim, J., Im, H.: Parallel skyline computation on multicore architectures. In: Int.Conf. on Data Engineering (ICDE), Shanghai, China (2009)

    Google Scholar 

  12. Heath, T., Hepp, M., Bizer, C.: Special Issue on Linked Data. International Journal on Semantic Web and Information Systems (IJSWIS) 5 (2009)

    Google Scholar 

  13. Lofi, C., El Maarry, K., Balke, W.-T.: Skyline Queries in Crowd-Enabled Databases. In: Int. Conf. on Extending Database Technology (EDBT), Genoa, Italy (2013)

    Google Scholar 

  14. Acu, E.: The treatment of missing values and its effect in the classifier accuracy. In: Classification Clustering and Data Mining Applications, pp. 1–9 (2004)

    Google Scholar 

  15. Balke, W.-T., Güntzer, U., Siberski, W.: Exploiting Indifference for Customization of Partial Order Skylines. In: Int. DB Engineering & Applications Symposium (IDEAS), Delhi, India (2006)

    Google Scholar 

  16. Balke, W.T., Güntzer, U., Siberski, W.: Restricting skyline sizes using weak Pareto dominance. Informatik - Forschung und Entwicklung 21, 165–178 (2007)

    Article  Google Scholar 

  17. Balke, W.-T., Zheng, J.X., Güntzer, U.: Approaching the Efficient Frontier: Cooperative Database Retrieval Using High-Dimensional Skylines. In: Zhou, L.-z., Ooi, B.-C., Meng, X. (eds.) DASFAA 2005. LNCS, vol. 3453, pp. 410–421. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  18. Godfrey, P.: Skyline cardinality for relational processing. In: Seipel, D., Turull-Torres, J.M. (eds.) FoIKS 2004. LNCS, vol. 2942, pp. 78–97. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  19. Powers, D.M.W.: Evaluation: From Precision, Recall and F-Factor to ROC, Informedness, Markedness & Correlation. Flinders University Adelaide SIE07001 (2007)

    Google Scholar 

  20. Lofi, C., Selke, J., Balke, W.-T.: Information Extraction Meets Crowdsourcing: A Promising Couple. Datenbank-Spektrum 12 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lofi, C., El Maarry, K., Balke, WT. (2013). Skyline Queries over Incomplete Data - Error Models for Focused Crowd-Sourcing. In: Ng, W., Storey, V.C., Trujillo, J.C. (eds) Conceptual Modeling. ER 2013. Lecture Notes in Computer Science, vol 8217. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41924-9_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-41924-9_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-41923-2

  • Online ISBN: 978-3-642-41924-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics