Challenges for Dataset Search

Maier, David; Megler, V. M.; Tufte, Kristin

doi:10.1007/978-3-319-05810-8_1

David Maier²²,
V. M. Megler²² &
Kristin Tufte²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8421))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

1765 Accesses

Abstract

Ranked search of datasets has emerged as a need as shared scientific archives grow in size and variety. Our own have shown that IR-style, feature-based relevance scoring can be an effective tool for data discovery in scientific archives. However, maintaining interactive response times as archives scale will be a challenge. We report here on our exploration of performance techniques for Data Near Here, a dataset search service. We present a sample of results evaluating filter-restart techniques in our system, including two variations, adaptive relaxation and contraction. We then outline further directions for research in this domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Dataset search: a survey

Article Open access 24 August 2019

Content-Based Dataset Retrieval Methods: Reproducibility of the ACORDAR Test Collection

Dynamic Exploratory Search for the Information Retrieval Anthology

References

Ageev, M., et al.: Find it if you can: A game for modeling different types of web search success using interaction data. In: Proceedings of SIGIR (2011)
Google Scholar
Aula, A., et al.: How does search behavior change as search becomes more difficult? In: Proc. of the 28th International Conference on Human Factors in Computing Systems, pp. 35–44 (2010)
Google Scholar
Bruno, N., et al.: Top-k selection queries over relational databases: Mapping strategies and performance evaluation. ACM Trans. Database Syst. TODS 27(2), 153–187 (2002)
Article Google Scholar
Carey, M.J., Kossmann, D.: On saying “enough already!” in SQL. ACM SIGMOD Rec. 26(2), 219–230 (1997)
Article Google Scholar
Chaudhuri, S., et al.: Integrating DB and IR technologies: What is the sound of one hand clapping. In: CIDR 2005, pp. 1–12 (2005)
Google Scholar
Gaasterland, T.: Cooperative answering through controlled query relaxation. IEEE Expert 12(5), 48–59 (1997)
Article Google Scholar
Hellerstein, J.M., Pfeffer, A.: The RD-tree: An index structure for sets. University of Wisconsin-Madison (1994).
Google Scholar
Ilyas, I.F., et al.: A survey of top-k query processing techniques in relational da-tabase systems. ACM Comput. Surv. CSUR. 40(4), 11 (2008)
Google Scholar
Jansen, B.J., et al.: Real life, real users, and real needs: A study and analysis of user queries on the web. Inf. Process. Manag. 36(2), 207–227 (2000)
Article Google Scholar
Koposov, S., Bartunov, O.: Q3C, Quad Tree Cube: The new sky-indexing con-cept for huge astronomical catalogues and its realization for main astronomical queries (cone search and Xmatch) in open source database PostgreSQL. In: Astronomical Data Analysis Software and Systems XV. pp. 735–738 (2006)
Google Scholar
Kunszt, P., et al.: The indexing of the SDSS science archive. Astron. Data Anal. Softw. Syst. 216 (2000)
Google Scholar
Lemson, G., et al.: Implementing a general spatial indexing library for relational databases of large numerical simulations. Scientific and Statistical Database Management, 509–526 (2011)
Google Scholar
Megler, V.M.: Ranked Similarity Search of Scientific Datasets: An Information Retrieval Approach (PhD Dissertation in preparation) (2014)
Google Scholar
Megler, V.M.: Taming the metadata mess. IEEE 29th International Conference on Data Engineering Workshops (ICDEW), pp. 286–289. IEEE Computer Society, Brisbane (2013)
Google Scholar
Megler, V.M., Maier, D.: Finding haystacks with needles: Ranked search for data using geospatial and temporal characteristics. In: Bayard Cushing, J., French, J., Bowers, S. (eds.) SSDBM 2011. LNCS, vol. 6809, pp. 55–72. Springer, Heidelberg (2011)
Chapter Google Scholar
Singh, G., et al.: A metadata catalog service for data intensive applications. In: Proceedings of the 2003 ACM/IEEE Conference on Supercomputing, p. 33 (2003)
Google Scholar
Wang, X., et al.: Liferaft: Data-driven, batch processing for the exploration of scientific databases. In: CIDR (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Portland State University, USA
David Maier, V. M. Megler & Kristin Tufte

Authors

David Maier
View author publications
You can also search for this author in PubMed Google Scholar
V. M. Megler
View author publications
You can also search for this author in PubMed Google Scholar
Kristin Tufte
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Engineering, Nanyang Technological University, 50 Nanyang Avenue, 639798, Singapore, Singapore
Sourav S. Bhowmick
Department of Computer Science, Utah State University, Old Main Hill, 4205, 84322-4205, Logan, UT, USA
Curtis E. Dyreson
Department of Computer Science, Aalborg University, Selma Lagerløfs Vej 300, 9220, Aalborg Øst, Denmark
Christian S. Jensen
Department of Computer Science, National University of Singapore, 13 Computing Drive, 117417, Singapore, Singapore
Mong Li Lee
Department of Computer Science, Udayana University, Jl. Kampus Unud Jimbaran Bali, 80364, Badung, Bali, Indonesia
Agus Muliantara
Information Systems Engineering, Christian-Albrechts-Universität zu Kiel, Olshausenstrasse 40, 24098, Kiel, Germany
Bernhard Thalheim

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Maier, D., Megler, V.M., Tufte, K. (2014). Challenges for Dataset Search. In: Bhowmick, S.S., Dyreson, C.E., Jensen, C.S., Lee, M.L., Muliantara, A., Thalheim, B. (eds) Database Systems for Advanced Applications. DASFAA 2014. Lecture Notes in Computer Science, vol 8421. Springer, Cham. https://doi.org/10.1007/978-3-319-05810-8_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-05810-8_1
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-05809-2
Online ISBN: 978-3-319-05810-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics