Skip to main content
Log in

Using Summaries to Search and Visualize Distributed Resources Addressing Spatial and Multimedia Features

  • SCHWERPUNKTBEITRAG
  • Published:
Datenbank-Spektrum Aims and scope Submit manuscript

Abstract

Summarization is an important means to cope with the challenges of big data. Summaries can help to achieve a first overview, they can be used to characterize subsets, they allow for the targeted access to data, and they build the basis for visualization techniques. In the present article, we point out the role of summaries as well as potential application scenarios. As examples, summarization techniques for spatial data (as an example for specific low dimensional techniques) and for general metric spaces (as a generic example with a broad spectrum of applications) are described. Furthermore, their use for resource selection and resource visualization in large distributed scenarios is outlined.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. Resource selection based on a single criterion, for example image content, is only a first step on the way to an effective retrieval system. When querying for multiple criteria, for example for an image with a particular content which was taken in a certain geographic region, criterion-specific resource rankings can be combined by applying a merging algorithm for ranked lists (cfe.g. [2, 19]). Moreover, resource description and selection schemes can be designed which support content-based search and in addition preserve the geographic distribution of the images by integrating both content-based as well as geographic search criteria (cfe.g. [14] which combines text and geographic information). However, such aspects are out of the scope here.

  2. Note that we may use the term (non-)relevant differently from traditional IR, where it is strongly related with the information need concept. We speak of (non-)relevant database objects to indicate that they are (not) part of the final query result. Similarly, (non-)relevant feature space regions or resources do (not) contain database objects from the final result. This interpretation corresponds with for example [27].

  3. An overview of different summarization techniques both using pivoting and/or aggregation is given in [3].

  4. In order to fit into a spreadsheet, the allocation has been sampled. The first 300,000 resources show the orginal values. Afterwards, three resources are sampled into one value, which is a very accurate depiction of the real distribution, since the 300,001st biggest resource only has 16 data points, degrading one-by-one to one for resource 2,491,785.

  5. We use the images of the MIRFLICKR-25000 collection [18]. CEDD [8] features are extracted and compared using the Hellinger distance. Images are assigned to resources by the Flickr user ID.

Literatur

  1. Becker B, Franciosa PG, Gschwind S, Ohler T, Thiemt G, Widmayer P (1991) An optimal algorithm for approximating a set of rectangles by two minimum area rectangles. Tech. rep., University of Freiburg, Freiburg

  2. Belkin NJ, Kantor P, Fox EA, Shaw JA (1995) Combining the evidence of multiple query representations for information retrieval. Inf Process Manage 31:431–448

  3. Blank D (2015) Resource description and selection for similarity search in metric spaces. University of Bamberg Press, Bamberg

  4. Blank D, Henrich A (2012) Describing and selecting collections of georeferenced media items in peer-to-peer information retrieval systems. In: Díaz L, Granell C, Huerta J (eds) Discovery of Geospatial Resources: methodologies, technologies, and emergent applications. IGI global, Hershey

  5. Blank D, Henrich A (2013) Resource description and selection for range query processing in general metric spaces. In: 15. Fachtagung Datenbanksysteme für Business, Technologie und Web, pp 93–112. GI, Magdeburg

  6. Börzsönyi S, Kossmann D, Stocker K (2001) The skyline operator. In: Proceedings of the 17th International Conference on Data Engineering. IEEE Computer Society, Washington DC, pp 421–430

  7. Bustos B, Keim D, Saupe D, Schreck T (2007) Content-based 3d object retrieval. IEEE Comput Graphics Appl 27:22–27

  8. Chatzichristofis SA, Boutalis YS (2008) CEDD: Color and Edge Directivity Descriptor: a Compact Descriptor for Image Indexing and Retrieval. In: Proc. of the 6th Intl. Conf. on Computer Vision Systems. Springer LNCS 5008, Berlin, pp 312–322

  9. Chávez E, Navarro G, Baeza-Yates R, Marroquín JL (2001) Searching in Metric Spaces. ACM Comput Surv 33:273–321

  10. Crespo A, Garcia-Molina H (2005) Semantic overlay networks for p2p systems. In: Proc. of the 3rd Intl. Workshop on Agents and Peer-to-Peer Computing. Springer LNCS 3601, Berlin, pp 1–13

  11. Cuenca-Acuna FM, Peery C, Martin RP, Nguyen TD (2003) {PlanetP}: using gossiping to build content addressable peer-to-peer information sharing communities. In: Proc. of the 12th Intl. Symp. on High Performance Distributed Computing. IEEE, Seattle, pp 236–246

  12. Doulkeridis C, Vlachou A, Nørvåg K, Vazirgiannis M (2010) Distributed semantic overlay networks. In: Shen X, Yu H, Buford J, Akon M (eds) Handbook of Peer-to-Peer Networking, Part IV. Springer Science+Business Media, Berlin, pp 463–494

  13. Eisenhardt M, Müller W, Henrich A, Blank D, El Allali S (2006) Clustering-based source selection for efficient image retrieval in peer-to-peer networks. In: Proc. of the 8th Intl. Symp. on Multimedia. IEEE, San Diego, pp 823–830

  14. Hariharan R, Hore B, Mehrotra S (2008) Discovering GIS sources on the web using summaries. In: Proc. of the 8th Joint Conf. on Digital Libraries. ACM/IEEE, Pittsburgh, pp 94–103

  15. Henrich A, Blank D (2012) Summarizing data collections by their spatial, temporal, textual and image footprint: techniques for source selection and beyond. Keynote Talk at the DFG SPP 1335 Text Workshop on Scalable Visual Analytics (held by: A. Henrich). Leipzig. November 15, 2012. www.uni-bamberg.de/fileadmin/uni/fakultaeten/wiai_lehrstuehle/medieninformatik/Dateien/Publikationen/2012/Henrich-Leipzig-2012.pdf (last visit: 8.10.2014)

  16. Hetland ML (2009) The basic principles of metric indexing. In: Coello CAC, Dehuri S, Ghosh S (eds) Swarm intelligence for multi-objective problems in data mining, chap. 9. Springer, Berlin, pp 199–232

  17. Hu X, Chiueh Tc, Shin KG (2009) Large-scale malware indexing using function-call graphs. In: Proc. of the 16th Intl. Conf. on Computer and Communications Security. ACM, New York, pp 611–620

  18. Huiskes MJ, Lew MS (2008) {The MIR Flickr Retrieval Evaluation}. In: Proc. of the 1st Intl. Conf. on Multimedia Information Retrieval. ACM, New York, pp 39–43

  19. Ilyas IF, Beskales G, Soliman MA (2008) A survey of top-k query processing techniques in relational database systems. ACM Comput Surveys 40:11:1–11:58

  20. Kufer S, Blank D, Henrich A (2012) Techniken der ressourcenbeschreibung und -auswahl für das geographische information retrieval. In: Proc. of LWA Workshop. Dortmund. www.uni-bamberg.de/fileadmin/uni/fakultaeten/wiai_lehrstuehle/medieninformatik/Dateien/Publikationen/2012/kufer_2012_techniken.pdf (last visit: 1.10.2014)

  21. Kufer S, Blank D, Henrich A (2013) Using hybrid techniques for resource description and selection in the context of distributed geographic information retrieval. In: Proc. of the 13th Intl. Symp. on Advances in Spatial and Temporal Databases. Springer LNCS 8098, Munich, pp 330–347

  22. Kufer S, Henrich A (2014) Hybrid quantized resource descriptions for geospatial source selection. In: Proc. of the 4th Intl. Workshop on Location and the Web. ACM, Shanghai, 17–24

  23. Kunze M, Weske M (2011) Metric trees for efficient similarity search in large process model repositories. Business Process Manag Workshops 66:535–546

  24. Lu J (2007) Full-text federated search in peer-to-peer networks. Ph.D. thesis, Language Technologies Institute, School of Computer Science, Carnegie Mellon University. CMU-LTI-07-003

  25. Müller W, Eisenhardt M, Henrich A (2005) Scalable summary based retrieval in P2P networks. In: Proc. of the 14th Intl. Conf. on Information and Knowledge Management. ACM, Bremen, pp 586–593

  26. Samet H (2006) Foundations of Multidimensional and metric data structures. Morgan Kaufmann, San Francisco

  27. Skopal T, Lokoč J, Bustos B (2012) D-cache: universal distance cache for metric access methods. IEEE Trans Knowl Data Eng 24:868–881

  28. Yang B, Garcia-Molina H (2003) Designing a super-peer network. In: Proc. of the 19th Intl. Conf. on Data Engineering, pp 49–60. IEEE

  29. Zezula P, Amato G, Dohnal V, Batko M (2006) Similarity Search: The Metric Space Approach. Springer New York, Inc., Secaucus

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Blank.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Blank, D., Henrich, A. & Kufer, S. Using Summaries to Search and Visualize Distributed Resources Addressing Spatial and Multimedia Features. Datenbank Spektrum 16, 67–76 (2016). https://doi.org/10.1007/s13222-015-0210-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13222-015-0210-5

Keywords

Navigation