skip to main content
10.1145/2857546.2857607acmconferencesArticle/Chapter ViewAbstractPublication PagesicuimcConference Proceedingsconference-collections
research-article

An Index Scheme for Similarity Search on Cloud Computing using MapReduce over Docker Container

Published: 04 January 2016 Publication History

Abstract

We consider the problem of similarity search over the large datasets in the distributed environment. The proposed framework employs the Vp-Tree algorithm that integrated on top of the MapReduce framework to achieve good performance as well as meet the scalability and fault tolerance requirements for the system while data scale up. Since VP-Tree algorithm was implemented initially for partition and searching data in the local disk access, we proposed a new approach to using it in the parallel environment. The key point of the Vp-Tree algorithm is that it distributed the similar data points into groups, thereby reducing number of data need to scan during the searching stage. Consequently, the response time of the entire system has been improved. Otherwise, we used an open source computer vision library OpenCV for detect the similarity among images in the dataset. We evaluate the performance of our proposed framework using a synthetic data to show the positive of our approach. The experiment shows that our proposed framework achieves 57% improvement in response time in comparison with running searching job in tradition Hadoop framework. We also compared our application running time on Docker container against VM-based environment. The result points out that deploy our system over Docker container provide higher performance than VM-based environment in term of response time.

References

[1]
OpenCV Library Tutorials. http://docs.opencv.org/doc/tutorials/tutorials.html.
[2]
H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool. Speeded-up robust features (surf). Comput. Vis. Image Underst., 110(3):346--359, June 2008.
[3]
S. Berchtold, D. A. Keim, and H.-P. Kriegel. The x-tree: An index structure for high-dimensional data. In Proceedings of the 22th International Conference on Very Large Data Bases, VLDB '96, pages 28--39, San Francisco, CA, USA, 1996. Morgan Kaufmann Publishers Inc.
[4]
K. Chakrabarti and S. Mehrotra. The hybrid tree: an index structure for high dimensional feature spaces. In Data Engineering, 1999. Proceedings., 15th International Conference on, pages 440--447, Mar 1999.
[5]
D. J. Crandall, L. Backstrom, D. Huttenlocher, and J. Kleinberg. Mapping the world's photos. In Proceedings of the 18th International Conference on World Wide Web, WWW '09, pages 761--770, New York, NY, USA, 2009. ACM.
[6]
J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. Commun. ACM, 51(1):107--113, Jan. 2008.
[7]
A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In Proceedings of the 25th International Conference on Very Large Data Bases, VLDB '99, pages 518--529, San Francisco, CA, USA, 1999. Morgan Kaufmann Publishers Inc.
[8]
N. Kumar, L. Zhang, and S. Nayar. What is a good nearest neighbors algorithm for finding similar patches in images? In Proceedings of the 10th European Conference on Computer Vision: Part II, ECCV '08, pages 364--378, Berlin, Heidelberg, 2008. Springer-Verlag.
[9]
H. Liao, J. Han, and J. Fang. Multi-dimensional index on hadoop distributed file system. In Proceedings of the 2010 IEEE Fifth International Conference on Networking, Architecture, and Storage, NAS '10, pages 240--249, Washington, DC, USA, 2010. IEEE Computer Society.
[10]
D. Merkel. Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239):2, 2014.
[11]
D. Moise, D. Shestakov, G. Gudmundsson, and L. Amsaleg. Indexing and searching 100m images with map-reduce. In Proceedings of the 3rd ACM Conference on International Conference on Multimedia Retrieval, ICMR '13, pages 17--24, New York, NY, USA, 2013. ACM.
[12]
M. Muja and D. G. Lowe. Fast approximate nearest neighbors with automatic algorithm configuration. VISAPP (1), 2, 2009.
[13]
M. Slaney and M. Casey. Locality-sensitive hashing for finding nearest neighbors {lecture notes}. Signal Processing Magazine, IEEE, 25(2):128--131, March 2008.
[14]
P. N. Yianilos. Data structures and algorithms for nearest neighbor search in general metric spaces. In Proceedings of the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA '93, pages 311--321, Philadelphia, PA, USA, 1993. Society for Industrial and Applied Mathematics.

Cited By

View all
  • (2020)Autoscaled RabbitMQ Kubernetes Cluster on single-board computers2020 IEEE International Conference on Automation, Quality and Testing, Robotics (AQTR)10.1109/AQTR49680.2020.9129886(1-6)Online publication date: May-2020
  • (2020)The state‐of‐the‐art in container technologies: Application, orchestration and securityConcurrency and Computation: Practice and Experience10.1002/cpe.566832:17Online publication date: 19-Jan-2020
  • (2019)A Lightweight Indexing Approach for Efficient Batch Similarity Processing with MapReduceSN Computer Science10.1007/s42979-019-0007-y1:1Online publication date: 25-Jun-2019
  • Show More Cited By

Index Terms

  1. An Index Scheme for Similarity Search on Cloud Computing using MapReduce over Docker Container

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      IMCOM '16: Proceedings of the 10th International Conference on Ubiquitous Information Management and Communication
      January 2016
      658 pages
      ISBN:9781450341424
      DOI:10.1145/2857546
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 04 January 2016

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. MapReduce
      2. Similarity search
      3. index scheme

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      IMCOM '16
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 213 of 621 submissions, 34%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 08 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2020)Autoscaled RabbitMQ Kubernetes Cluster on single-board computers2020 IEEE International Conference on Automation, Quality and Testing, Robotics (AQTR)10.1109/AQTR49680.2020.9129886(1-6)Online publication date: May-2020
      • (2020)The state‐of‐the‐art in container technologies: Application, orchestration and securityConcurrency and Computation: Practice and Experience10.1002/cpe.566832:17Online publication date: 19-Jan-2020
      • (2019)A Lightweight Indexing Approach for Efficient Batch Similarity Processing with MapReduceSN Computer Science10.1007/s42979-019-0007-y1:1Online publication date: 25-Jun-2019
      • (2019)A study on performance measures for auto-scaling CPU-intensive containerized applicationsCluster Computing10.1007/s10586-018-02890-122:3(995-1006)Online publication date: 1-Sep-2019
      • (2018)Container Orchestration: A SurveySystems Modeling: Methodologies and Tools10.1007/978-3-319-92378-9_14(221-235)Online publication date: 17-Oct-2018
      • (2018)An Efficient Batch Similarity Processing with MapReduceFuture Data and Security Engineering10.1007/978-3-030-03192-3_12(158-171)Online publication date: 27-Oct-2018
      • (2017)Measuring Docker PerformanceProceedings of the 8th ACM/SPEC on International Conference on Performance Engineering Companion10.1145/3053600.3053605(11-16)Online publication date: 18-Apr-2017
      • (2017)Elastic Provisioning of Virtual Machines for Container DeploymentProceedings of the 8th ACM/SPEC on International Conference on Performance Engineering Companion10.1145/3053600.3053602(5-10)Online publication date: 18-Apr-2017
      • (2017)Auto-Scaling of Containers: The Impact of Relative and Absolute Metrics2017 IEEE 2nd International Workshops on Foundations and Applications of Self* Systems (FAS*W)10.1109/FAS-W.2017.149(207-214)Online publication date: Sep-2017

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media