Skip to main content

Source Selection in Large Scale Data Contexts: An Optimization Approach

  • Conference paper
Database and Expert Systems Applications (DEXA 2010)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6261))

Included in the following conference series:

Abstract

This paper presents OptiSource, a novel approach of source selection that reduces the number of data sources accessed during query evaluation in large scale distributed data contexts. These contexts are typical of large scale Virtual Organizations (VO) where autonomous organizations share data about a group of domain concepts (e.g. patient, gene). The instances of such concepts are constructed from non-disjointed fragments provided by several local data sources. Such sources overlap in a non mastered way making data location uncertain. This fact, in addition to the absence of reliable statistics on source contents and the large number of sources, make current proposals unsuitable in terms of response quality and/or response time. OptiSource optimizes source selection by taking advantage of organizational aspects of VOs to predict the benefit of using a source. It uses an optimization model to distinguish the sets of sources that maximize benefits and minimize the number of sources to contact to while satisfying resource constraints. The precision and recall of source selection is highly improved as demonstrated by the tests performed with the OptiSource prototype.

This research was supported by the project Ecos-Colciencias C06M02.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Foster, I., Kesselman, C., Tuecke, S.: The anatomy of the grid: Enabling scalable virtual organizations. Int. J. High Perform. Comput. Appl. 15, 200–222 (2001)

    Article  Google Scholar 

  2. NEESGrid: Nees consortium (2008), http://neesgrid.ncsa.uiuc.edu/

  3. BIRN: Bioinformatics research network (2008), http://www.loni.ucla.edu/birn/

  4. Levy, A.Y., Rajaraman, A., Ordille, J.J.: Querying heterogeneous information sources using source descriptions. In: VLDB 1996, Bombay, India, pp. 251–262 (1996)

    Google Scholar 

  5. Garcia-Molina, H., Papakonstantinou, Y., Quass, D., Rajaraman, A., Sagiv, Y., Ullman, J.D., Vassalos, V., Widom, J.: The tsimmis approach to mediation: Data models and languages. Journal of Intelligent Information Systems 8, 117–132 (1997)

    Article  Google Scholar 

  6. Tomasic, A., Raschid, L., Valduriez, P.: Scaling access to heterogeneous data sources with DISCO. Knowledge and Data Engineering 10, 808–823 (1998)

    Article  Google Scholar 

  7. Pottinger, R., Halevy, A.Y.: Minicon: A scalable algorithm for answering queries using views. VLDB J. 10, 182–198 (2001)

    MATH  Google Scholar 

  8. Doan, A., Halevy, A.Y.: Efficiently ordering query plans for data integration. In: ICDE ’02, Washington, DC, USA, p. 393. IEEE Computer Society, Los Alamitos (2002)

    Google Scholar 

  9. Huebsch, R., Hellerstein, J.M., Lanham, N., Loo, B.T., Shenker, S., Stoica, I.: Querying the internet with pier. In: VLDB 2003, pp. 321–332 (2003)

    Google Scholar 

  10. Tatarinov, I., Ives, Z., Madhavan, J., Halevy, A., Suciu, D., Dalvi, N., Dong, X.L., Kadiyska, Y., Miklau, G., Mork, P.: The piazza peer data management project. SIGMOD Rec. 32, 47–52 (2003)

    Article  Google Scholar 

  11. Nejdl, W., Wolf, B., Qu, C., Decker, S., Sintek, M., Naeve, A., Nilsson, M., Palmér, M., Risch, T.: Edutella: a p2p networking infrastructure based on rdf. In: WWW ’02, pp. 604–615. ACM, New York (2002)

    Chapter  Google Scholar 

  12. Adjiman, P., Goasdoué, F., Rousset, M.C.: Somerdfs in the semantic web. J. Data Semantics 8, 158–181 (2007)

    Google Scholar 

  13. Bleiholder, J., Khuller, S., Naumann, F., Raschid, L., Wu, Y.: Query planning in the presence of overlapping sources. In: Ioannidis, Y., Scholl, M.H., Schmidt, J.W., Matthes, F., Hatzopoulos, M., Böhm, K., Kemper, A., Grust, T., Böhm, C. (eds.) EDBT 2006. LNCS, vol. 3896, pp. 811–828. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  14. Venugopal, S., Buyya, R., Ramamohanarao, K.: A taxonomy of data grids for distributed data sharing, management, and processing. ACM Comput. Surv. 38, 3 (2006)

    Article  Google Scholar 

  15. Wolf, G., Khatri, H., Chokshi, B., Fan, J., Chen, Y., Kambhampati, S.: Query processing over incomplete autonomous databases. In: VLDB, pp. 651–662 (2007)

    Google Scholar 

  16. Naumann, F., Freytag, J.C., Leser, U.: Completeness of integrated information sources. Inf. Syst. 29, 583–615 (2004)

    Article  Google Scholar 

  17. Quiané-Ruiz, J.A., Lamarre, P., Valduriez, P.: Sqlb: A query allocation framework for autonomous consumers and providers. In: VLDB, pp. 974–985 (2007)

    Google Scholar 

  18. Horrocks, I.: Owl: A description logic based ontology language. In: CP, pp. 5–8 (2005)

    Google Scholar 

  19. Pomares, A., Roncancio, C., Abasolo, J., del Pilar Villamil, M.: Knowledge based query processing. In: Filipe, J., Cordeiro, J. (eds.) ICEIS. LNBIP, vol. 24, pp. 208–219. Springer, Heidelberg (2009)

    Google Scholar 

  20. Hillier, F.S., Lieberman, G.J.: Introduction to Operations Research, 8th edn. McGraw-Hill, New York (2005)

    Google Scholar 

  21. Makhorin, A.: Gnu project, gnu linear programming kit (2009), http://www.gnu.org/software/glpk/

  22. Eric Prud, A.S.: Sparql query language for rdf (2007), http://www.w3.org/tr/rdf-sparql-query/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Pomares, A., Roncancio, C., Cung, VD., Abásolo, J., Villamil, MdP. (2010). Source Selection in Large Scale Data Contexts: An Optimization Approach. In: Bringas, P.G., Hameurlain, A., Quirchmayr, G. (eds) Database and Expert Systems Applications. DEXA 2010. Lecture Notes in Computer Science, vol 6261. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15364-8_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15364-8_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15363-1

  • Online ISBN: 978-3-642-15364-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics