Skip to main content
Log in

Similarity Searching for the Big Data

Challenges and Research Objectives

  • Published:
Mobile Networks and Applications Aims and scope Submit manuscript

Abstract

Analysis of contemporary Big Data collections require an effective and efficient content-based access to data which is usually unstructured. This first implies a necessity to uncover descriptive knowledge of complex and heterogeneous objects to make them findable. Second, multimodal search structures are needed to efficiently execute complex similarity queries possibly in outsourced environments while preserving privacy. After explaining the impacts of Big Data on similarity searching and summarizing the state of the art in the search technology, four specific research objectives to tackle the challenges are outlined and discussed. It is believed that effective and efficient processing of raw data for object findability and developing hybrid similarity search structures for multi-modal and privacy-preserving searching are necessary to achieve a scalable similarity search technology able to operate on Big Data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. http://www.mir2ed.org/

  2. http://www.nmis.isti.cnr.it/amato/similarity-search-book/

References

  1. Challenges and Opportunities with Big Data. A community white paper developed by leading researchers across the United States (accessed on 2014). http://cra.org/ccc/docs/init/bigdatawhitepaper.pdf

  2. Baeza-Yates RA, Ribeiro-Neto BA (2011) Modern information retrieval-the concepts and technology behind search, second

  3. Batko M, Falchi F, Lucchese C, Novak D, Perego R, Rabitti F, Sedmidubsky J, Zezula P (2010) Building a web-scale image similarity search system. Multimed Tools Appl 47(3):599–629. doi:10.1007/s11042-009-0339-z

    Article  Google Scholar 

  4. Batko M, Novak D, Falchi F, Zezula P (2008) Scalability comparison of Peer-to-Peer similarity search structures. Futur Gener Comput Syst 24(8):834–848. doi:10.1016/j.future.2007.07.012.

    Article  Google Scholar 

  5. Beecks C, Ivanescu AM, Seidl T, Martin D, Pischke P, Kneer R (2011) Applying similarity search for the investigation of the fuel injection process, A. Ferro (ed.) SISAP, pp. 117–118. ACM

  6. Chávez E, Navarro G, Baeza-Yates RA , Marroquín JL (2001) Searching in metric spaces. ACM Comput Surv 33(3):273–321. doi:10.1145/502807.502808

    Article  Google Scholar 

  7. Chen L, Cong G, Jensen CS, Wu D (2013) Spatial keyword query processing: An experimental evaluation. PVLDB 6(3):217–228

    Google Scholar 

  8. Chum O, Matas J (2010) Large-scale discovery of spatially related images. IEEE Trans Pattern Anal Mach Intell 32(2):371–377

    Article  Google Scholar 

  9. Ciaccia P, Patella M, Zezula P (1997) M-Tree: An efficient access method for similarity search in metric spaces. In: Proceedings of 23rd International Conference on Very Large Data Bases (VLDB ’97), vol 25, pp 426–435

  10. Deza M, Deza E (2012) Encyclopedia of Distances. Springer

  11. Dhar V (2013) Data Science and Prediction. Commun ACM 56(12):64–73

    Article  Google Scholar 

  12. Fagin R, Kumar R, Sivakumar D (2003) Comparing top k lists. In: Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms, SODA ’03. http://portal.acm.org/citation.cfm?id=644108.644113. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, pp 28–36

  13. Hjaltason GR, Samet H (2003) Index-driven similarity search in metric spaces. ACM Trans. Database Syst 28(4):517–580

    Article  Google Scholar 

  14. Kamara S, Charalampos P, Tom R (2012) Dynamic Searchable Symmetric Encryption. In: Proceedings of the 2012 ACM conference on Computer and communications security, pp 965–976

  15. Kamara S, Lauter K (2010) Cryptographic cloud storage Financial Cryptography Workshops, pp 136–149

  16. Kozak S (2013) Efficiency and security in similarity cloud services. PVLDB 6(12):1450–1455

    Google Scholar 

  17. Kozak S, Novak D, Zezula P (2012) Secure metric-based index for similarity cloud. In: Jonker W, Petkovic M (eds) Secure Data Management, Lect Notes Comput Sci, vol 7482. Springer, pp 130–147

  18. Krulis M, Skopal T, Lokoc J, Beecks C (2012) Combining CPU and GPU Architectures for Fast Similarity Search. Distrib Parallel Databases 30(3):179–207

    Article  Google Scholar 

  19. Kuzu M, Islam MS, Kantarcioglu M (2012) Efficient similarity search over encrypted data. In: A. Kementsietsidis, M.A.V. Salles (eds.) ICDE, pp. 1156–1167. IEEE Computer Society

  20. Larkey L, Markman A (2005) Processes of similarity judgment. Cogn Sci 29:1061–1076

    Article  Google Scholar 

  21. Lokoc J, Novák D, Batko M, Skopal T (2012) Visual image search: Feature signatures or/and global descriptors. In: Navarro G, Pestov V (eds) SISAP, Lecture Notes in Computer Science, vol. 7404, pp. 177–191. Springer

  22. Marz N, Warren J (2014) In: Principles and best practices of scalable realtime data systems. Manning Publications Co

  23. Menezez A, van Oorschot P, Vanstone S (1997) Handbook of Applied Cryptography. CRR Press

  24. Morville P, Callender J (2010) Search Patterns. O’Reilly Media, Inc

  25. Novak D, Batko M, Zezula P (2009) Generic similarity search engine demonstrated by an image retrieval application. In: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval - SIGIR ’09, p. 840. ACM Press, New York, New York, USA. http://portal.acm.org/citation.cfm?doid=1571941.1572160

  26. Novak D, Batko M, Zezula P (2012) Large-scale similarity data management with distributed Metric Index. Inf Process Manag 48(5):855–872

    Article  Google Scholar 

  27. O’Searcoid M (2006) Metric Spaces. Springer Undergraduate Mathematics Series. Springer

  28. Salembier P, Smith J (2002) Overview of mpeg-7 multimedia description schemes and schema tools. In: Introduction to MPEG-7: Multimedia Content Description Interface

  29. Samet H (2005) Foundations of Multidimensional and Metric Data Structures. Computer Graphics and Geometric Modeling. Morgan Kaufmann. Publishers Inc, USA

    Google Scholar 

  30. Skopal T, Bustos B (2011) On nonmetric similarity search problems in complex domains. ACM Computing Surveys 43(4):1–50. doi:10.1145/1978802.1978813.

    Article  Google Scholar 

  31. Sparrow B, Liu J, Wegner DM (2011) Google effects on memory: Cognitive consequences of having information at our fingertips. Science 333:776–778

    Article  Google Scholar 

  32. Vosniadou S, Ortony A (2003) Similarity and Analogical Reasoning. Advances in Database Systems. Cambridge University Press

  33. Yiu ML, Assent I, Jensen CS, Kalnis P (2012) Outsourced similarity search on metric data assets. IEEE Trans Knowl Data Eng 24(2):338–352

    Article  Google Scholar 

  34. Zezula P, Amato G, Dohnal V, Batko M (2006) Similarity Search: The Metric Space Approach, Advances in Database Systems, vol. 32. Springer

  35. Zezula P, Savino P, Amato G, Rabitti F (1998) Approximate similarity retrieval with M-Trees. The VLDB Journal 7(4):275–293

    Article  Google Scholar 

  36. Zikopoulos P, Eaton C (2006) Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data. McGraw-Hill Education

Download references

Acknowledgments

This research was supported by the Czech Science Foundation project number P103/12/G084.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pavel Zezula.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zezula, P. Similarity Searching for the Big Data. Mobile Netw Appl 20, 487–496 (2015). https://doi.org/10.1007/s11036-014-0547-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11036-014-0547-2

Keywords

Navigation