Abstract
Analysis of contemporary Big Data collections require an effective and efficient content-based access to data which is usually unstructured. This first implies a necessity to uncover descriptive knowledge of complex and heterogeneous objects to make them findable. Second, multimodal search structures are needed to efficiently execute complex similarity queries possibly in outsourced environments while preserving privacy. After explaining the impacts of Big Data on similarity searching and summarizing the state of the art in the search technology, four specific research objectives to tackle the challenges are outlined and discussed. It is believed that effective and efficient processing of raw data for object findability and developing hybrid similarity search structures for multi-modal and privacy-preserving searching are necessary to achieve a scalable similarity search technology able to operate on Big Data.
Similar content being viewed by others
References
Challenges and Opportunities with Big Data. A community white paper developed by leading researchers across the United States (accessed on 2014). http://cra.org/ccc/docs/init/bigdatawhitepaper.pdf
Baeza-Yates RA, Ribeiro-Neto BA (2011) Modern information retrieval-the concepts and technology behind search, second
Batko M, Falchi F, Lucchese C, Novak D, Perego R, Rabitti F, Sedmidubsky J, Zezula P (2010) Building a web-scale image similarity search system. Multimed Tools Appl 47(3):599–629. doi:10.1007/s11042-009-0339-z
Batko M, Novak D, Falchi F, Zezula P (2008) Scalability comparison of Peer-to-Peer similarity search structures. Futur Gener Comput Syst 24(8):834–848. doi:10.1016/j.future.2007.07.012.
Beecks C, Ivanescu AM, Seidl T, Martin D, Pischke P, Kneer R (2011) Applying similarity search for the investigation of the fuel injection process, A. Ferro (ed.) SISAP, pp. 117–118. ACM
Chávez E, Navarro G, Baeza-Yates RA , Marroquín JL (2001) Searching in metric spaces. ACM Comput Surv 33(3):273–321. doi:10.1145/502807.502808
Chen L, Cong G, Jensen CS, Wu D (2013) Spatial keyword query processing: An experimental evaluation. PVLDB 6(3):217–228
Chum O, Matas J (2010) Large-scale discovery of spatially related images. IEEE Trans Pattern Anal Mach Intell 32(2):371–377
Ciaccia P, Patella M, Zezula P (1997) M-Tree: An efficient access method for similarity search in metric spaces. In: Proceedings of 23rd International Conference on Very Large Data Bases (VLDB ’97), vol 25, pp 426–435
Deza M, Deza E (2012) Encyclopedia of Distances. Springer
Dhar V (2013) Data Science and Prediction. Commun ACM 56(12):64–73
Fagin R, Kumar R, Sivakumar D (2003) Comparing top k lists. In: Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms, SODA ’03. http://portal.acm.org/citation.cfm?id=644108.644113. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, pp 28–36
Hjaltason GR, Samet H (2003) Index-driven similarity search in metric spaces. ACM Trans. Database Syst 28(4):517–580
Kamara S, Charalampos P, Tom R (2012) Dynamic Searchable Symmetric Encryption. In: Proceedings of the 2012 ACM conference on Computer and communications security, pp 965–976
Kamara S, Lauter K (2010) Cryptographic cloud storage Financial Cryptography Workshops, pp 136–149
Kozak S (2013) Efficiency and security in similarity cloud services. PVLDB 6(12):1450–1455
Kozak S, Novak D, Zezula P (2012) Secure metric-based index for similarity cloud. In: Jonker W, Petkovic M (eds) Secure Data Management, Lect Notes Comput Sci, vol 7482. Springer, pp 130–147
Krulis M, Skopal T, Lokoc J, Beecks C (2012) Combining CPU and GPU Architectures for Fast Similarity Search. Distrib Parallel Databases 30(3):179–207
Kuzu M, Islam MS, Kantarcioglu M (2012) Efficient similarity search over encrypted data. In: A. Kementsietsidis, M.A.V. Salles (eds.) ICDE, pp. 1156–1167. IEEE Computer Society
Larkey L, Markman A (2005) Processes of similarity judgment. Cogn Sci 29:1061–1076
Lokoc J, Novák D, Batko M, Skopal T (2012) Visual image search: Feature signatures or/and global descriptors. In: Navarro G, Pestov V (eds) SISAP, Lecture Notes in Computer Science, vol. 7404, pp. 177–191. Springer
Marz N, Warren J (2014) In: Principles and best practices of scalable realtime data systems. Manning Publications Co
Menezez A, van Oorschot P, Vanstone S (1997) Handbook of Applied Cryptography. CRR Press
Morville P, Callender J (2010) Search Patterns. O’Reilly Media, Inc
Novak D, Batko M, Zezula P (2009) Generic similarity search engine demonstrated by an image retrieval application. In: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval - SIGIR ’09, p. 840. ACM Press, New York, New York, USA. http://portal.acm.org/citation.cfm?doid=1571941.1572160
Novak D, Batko M, Zezula P (2012) Large-scale similarity data management with distributed Metric Index. Inf Process Manag 48(5):855–872
O’Searcoid M (2006) Metric Spaces. Springer Undergraduate Mathematics Series. Springer
Salembier P, Smith J (2002) Overview of mpeg-7 multimedia description schemes and schema tools. In: Introduction to MPEG-7: Multimedia Content Description Interface
Samet H (2005) Foundations of Multidimensional and Metric Data Structures. Computer Graphics and Geometric Modeling. Morgan Kaufmann. Publishers Inc, USA
Skopal T, Bustos B (2011) On nonmetric similarity search problems in complex domains. ACM Computing Surveys 43(4):1–50. doi:10.1145/1978802.1978813.
Sparrow B, Liu J, Wegner DM (2011) Google effects on memory: Cognitive consequences of having information at our fingertips. Science 333:776–778
Vosniadou S, Ortony A (2003) Similarity and Analogical Reasoning. Advances in Database Systems. Cambridge University Press
Yiu ML, Assent I, Jensen CS, Kalnis P (2012) Outsourced similarity search on metric data assets. IEEE Trans Knowl Data Eng 24(2):338–352
Zezula P, Amato G, Dohnal V, Batko M (2006) Similarity Search: The Metric Space Approach, Advances in Database Systems, vol. 32. Springer
Zezula P, Savino P, Amato G, Rabitti F (1998) Approximate similarity retrieval with M-Trees. The VLDB Journal 7(4):275–293
Zikopoulos P, Eaton C (2006) Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data. McGraw-Hill Education
Acknowledgments
This research was supported by the Czech Science Foundation project number P103/12/G084.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zezula, P. Similarity Searching for the Big Data. Mobile Netw Appl 20, 487–496 (2015). https://doi.org/10.1007/s11036-014-0547-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11036-014-0547-2