Skip to main content

Topic Modeling for Search and Exploration in Multivariate Research Data Repositories

  • Conference paper
Research and Advanced Technology for Digital Libraries (TPDL 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8092))

Included in the following conference series:

Abstract

Huge amounts of multivariate research data are produced and made publicly available in digital libraries. Little research focused on similarity functions that take multivariate data documents as a whole into account. Such similarity functions are highly beneficial for users, by enabling them to browse and query large collections of multivariate data using nearest-neighbor indexing. In this paper we tackle this challenge and propose a novel similarity function for multivariate data documents based on topic-modeling. Based on a previously developed bag-of-words approach for multivariate data, we can then learn a topic model for a collection of multivariate data documents and represent each document as a mixture of topics. This representation is very suitable for efficient nearest-neighbor indexing and clustering according to the topic distribution of a document. We present a use-case where we apply this approach to retrieval of multivariate data in the field of climate research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bernard, J., Ruppert, T., Scherer, M., Kohlhammer, J., Schreck, T.: Content-based layouts for exploratory metadata search in scientific research data. In: Boughida, K.B., Howard, B., Nelson, M.L., de Sompel, H.V., Sølvberg, I. (eds.) JCDL, pp. 139–148. ACM (2012)

    Google Scholar 

  2. Berndt, R., et al.: The PROBADO project - approach and lessons learned in building a digital library system for heterogeneous non-textual documents. In: Lalmas, M., Jose, J., Rauber, A., Sebastiani, F., Frommholz, I. (eds.) ECDL 2010. LNCS, vol. 6273, pp. 376–383. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  3. Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. The Journal of Machine Learning Research 3, 993–1022 (2003)

    MATH  Google Scholar 

  4. Diepenbroek, M., Grobe, H., Reinke, M., Schindler, U., Schlitzer, R., Sieger, R., Wefer, G.: Pangaea–an information system for environmental sciences. Computers & Geosciences 28(10), 1201–1210 (2002)

    Article  Google Scholar 

  5. Eitz, M., Richter, R., Boubekeur, T., Hildebrand, K., Alexa, M.: Sketch-based shape retrieval. ACM Trans. Graph (Proc. SIGGRAPH) 31(4), 31:1–31:10 (2012)

    Google Scholar 

  6. Lew, M., Sebe, N., Djeraba, C., Jain, R.: Content-based multimedia information retrieval: State of the art and challenges. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP) 2(1), 1–19 (2006)

    Article  Google Scholar 

  7. Lin, J., Khade, R., Li, Y.: Rotation-invariant similarity in time series using bag-of-patterns representation. J. of Intelligent Information Systems, 1–29 (2011)

    Google Scholar 

  8. Rowley-Brooke, R., Pitié, F., Kokaram, A.: A ground truth bleed-through document image database. In: Zaphiris, P., Buchanan, G., Rasmussen, E., Loizides, F. (eds.) TPDL 2012. LNCS, vol. 7489, pp. 185–196. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  9. Scherer, M., Bernard, J., Schreck, T.: Retrieval and exploratory search in multivariate research data repositories using regressional features. In: Proceeding of the 11th Annual International ACM/IEEE Joint Conference on Digital Libraries, JCDL 2011, pp. 363–372. ACM Press, New York (2011)

    Google Scholar 

  10. Scherer, M., von Landesberger, T., Schreck, T.: Visual-interactive querying for multivariate research data repositories using bag-of-words. In: Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2013, pp. xx–xx (to appear 2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Scherer, M., von Landesberger, T., Schreck, T. (2013). Topic Modeling for Search and Exploration in Multivariate Research Data Repositories. In: Aalberg, T., Papatheodorou, C., Dobreva, M., Tsakonas, G., Farrugia, C.J. (eds) Research and Advanced Technology for Digital Libraries. TPDL 2013. Lecture Notes in Computer Science, vol 8092. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40501-3_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40501-3_39

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40500-6

  • Online ISBN: 978-3-642-40501-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics