Abstract
Summarization is an important means to cope with the challenges of big data. Summaries can help to achieve a first overview, they can be used to characterize subsets, they allow for the targeted access to data, and they build the basis for visualization techniques. In the present article, we point out the role of summaries as well as potential application scenarios. As examples, summarization techniques for spatial data (as an example for specific low dimensional techniques) and for general metric spaces (as a generic example with a broad spectrum of applications) are described. Furthermore, their use for resource selection and resource visualization in large distributed scenarios is outlined.
Similar content being viewed by others
Notes
Resource selection based on a single criterion, for example image content, is only a first step on the way to an effective retrieval system. When querying for multiple criteria, for example for an image with a particular content which was taken in a certain geographic region, criterion-specific resource rankings can be combined by applying a merging algorithm for ranked lists (cf. e.g. [2, 19]). Moreover, resource description and selection schemes can be designed which support content-based search and in addition preserve the geographic distribution of the images by integrating both content-based as well as geographic search criteria (cf. e.g. [14] which combines text and geographic information). However, such aspects are out of the scope here.
Note that we may use the term (non-)relevant differently from traditional IR, where it is strongly related with the information need concept. We speak of (non-)relevant database objects to indicate that they are (not) part of the final query result. Similarly, (non-)relevant feature space regions or resources do (not) contain database objects from the final result. This interpretation corresponds with for example [27].
An overview of different summarization techniques both using pivoting and/or aggregation is given in [3].
In order to fit into a spreadsheet, the allocation has been sampled. The first 300,000 resources show the orginal values. Afterwards, three resources are sampled into one value, which is a very accurate depiction of the real distribution, since the 300,001st biggest resource only has 16 data points, degrading one-by-one to one for resource 2,491,785.
Literatur
Becker B, Franciosa PG, Gschwind S, Ohler T, Thiemt G, Widmayer P (1991) An optimal algorithm for approximating a set of rectangles by two minimum area rectangles. Tech. rep., University of Freiburg, Freiburg
Belkin NJ, Kantor P, Fox EA, Shaw JA (1995) Combining the evidence of multiple query representations for information retrieval. Inf Process Manage 31:431–448
Blank D (2015) Resource description and selection for similarity search in metric spaces. University of Bamberg Press, Bamberg
Blank D, Henrich A (2012) Describing and selecting collections of georeferenced media items in peer-to-peer information retrieval systems. In: Díaz L, Granell C, Huerta J (eds) Discovery of Geospatial Resources: methodologies, technologies, and emergent applications. IGI global, Hershey
Blank D, Henrich A (2013) Resource description and selection for range query processing in general metric spaces. In: 15. Fachtagung Datenbanksysteme für Business, Technologie und Web, pp 93–112. GI, Magdeburg
Börzsönyi S, Kossmann D, Stocker K (2001) The skyline operator. In: Proceedings of the 17th International Conference on Data Engineering. IEEE Computer Society, Washington DC, pp 421–430
Bustos B, Keim D, Saupe D, Schreck T (2007) Content-based 3d object retrieval. IEEE Comput Graphics Appl 27:22–27
Chatzichristofis SA, Boutalis YS (2008) CEDD: Color and Edge Directivity Descriptor: a Compact Descriptor for Image Indexing and Retrieval. In: Proc. of the 6th Intl. Conf. on Computer Vision Systems. Springer LNCS 5008, Berlin, pp 312–322
Chávez E, Navarro G, Baeza-Yates R, Marroquín JL (2001) Searching in Metric Spaces. ACM Comput Surv 33:273–321
Crespo A, Garcia-Molina H (2005) Semantic overlay networks for p2p systems. In: Proc. of the 3rd Intl. Workshop on Agents and Peer-to-Peer Computing. Springer LNCS 3601, Berlin, pp 1–13
Cuenca-Acuna FM, Peery C, Martin RP, Nguyen TD (2003) {PlanetP}: using gossiping to build content addressable peer-to-peer information sharing communities. In: Proc. of the 12th Intl. Symp. on High Performance Distributed Computing. IEEE, Seattle, pp 236–246
Doulkeridis C, Vlachou A, Nørvåg K, Vazirgiannis M (2010) Distributed semantic overlay networks. In: Shen X, Yu H, Buford J, Akon M (eds) Handbook of Peer-to-Peer Networking, Part IV. Springer Science+Business Media, Berlin, pp 463–494
Eisenhardt M, Müller W, Henrich A, Blank D, El Allali S (2006) Clustering-based source selection for efficient image retrieval in peer-to-peer networks. In: Proc. of the 8th Intl. Symp. on Multimedia. IEEE, San Diego, pp 823–830
Hariharan R, Hore B, Mehrotra S (2008) Discovering GIS sources on the web using summaries. In: Proc. of the 8th Joint Conf. on Digital Libraries. ACM/IEEE, Pittsburgh, pp 94–103
Henrich A, Blank D (2012) Summarizing data collections by their spatial, temporal, textual and image footprint: techniques for source selection and beyond. Keynote Talk at the DFG SPP 1335 Text Workshop on Scalable Visual Analytics (held by: A. Henrich). Leipzig. November 15, 2012. www.uni-bamberg.de/fileadmin/uni/fakultaeten/wiai_lehrstuehle/medieninformatik/Dateien/Publikationen/2012/Henrich-Leipzig-2012.pdf (last visit: 8.10.2014)
Hetland ML (2009) The basic principles of metric indexing. In: Coello CAC, Dehuri S, Ghosh S (eds) Swarm intelligence for multi-objective problems in data mining, chap. 9. Springer, Berlin, pp 199–232
Hu X, Chiueh Tc, Shin KG (2009) Large-scale malware indexing using function-call graphs. In: Proc. of the 16th Intl. Conf. on Computer and Communications Security. ACM, New York, pp 611–620
Huiskes MJ, Lew MS (2008) {The MIR Flickr Retrieval Evaluation}. In: Proc. of the 1st Intl. Conf. on Multimedia Information Retrieval. ACM, New York, pp 39–43
Ilyas IF, Beskales G, Soliman MA (2008) A survey of top-k query processing techniques in relational database systems. ACM Comput Surveys 40:11:1–11:58
Kufer S, Blank D, Henrich A (2012) Techniken der ressourcenbeschreibung und -auswahl für das geographische information retrieval. In: Proc. of LWA Workshop. Dortmund. www.uni-bamberg.de/fileadmin/uni/fakultaeten/wiai_lehrstuehle/medieninformatik/Dateien/Publikationen/2012/kufer_2012_techniken.pdf (last visit: 1.10.2014)
Kufer S, Blank D, Henrich A (2013) Using hybrid techniques for resource description and selection in the context of distributed geographic information retrieval. In: Proc. of the 13th Intl. Symp. on Advances in Spatial and Temporal Databases. Springer LNCS 8098, Munich, pp 330–347
Kufer S, Henrich A (2014) Hybrid quantized resource descriptions for geospatial source selection. In: Proc. of the 4th Intl. Workshop on Location and the Web. ACM, Shanghai, 17–24
Kunze M, Weske M (2011) Metric trees for efficient similarity search in large process model repositories. Business Process Manag Workshops 66:535–546
Lu J (2007) Full-text federated search in peer-to-peer networks. Ph.D. thesis, Language Technologies Institute, School of Computer Science, Carnegie Mellon University. CMU-LTI-07-003
Müller W, Eisenhardt M, Henrich A (2005) Scalable summary based retrieval in P2P networks. In: Proc. of the 14th Intl. Conf. on Information and Knowledge Management. ACM, Bremen, pp 586–593
Samet H (2006) Foundations of Multidimensional and metric data structures. Morgan Kaufmann, San Francisco
Skopal T, Lokoč J, Bustos B (2012) D-cache: universal distance cache for metric access methods. IEEE Trans Knowl Data Eng 24:868–881
Yang B, Garcia-Molina H (2003) Designing a super-peer network. In: Proc. of the 19th Intl. Conf. on Data Engineering, pp 49–60. IEEE
Zezula P, Amato G, Dohnal V, Batko M (2006) Similarity Search: The Metric Space Approach. Springer New York, Inc., Secaucus
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Blank, D., Henrich, A. & Kufer, S. Using Summaries to Search and Visualize Distributed Resources Addressing Spatial and Multimedia Features. Datenbank Spektrum 16, 67–76 (2016). https://doi.org/10.1007/s13222-015-0210-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13222-015-0210-5