Massive picture retrieval system based on big data image mining

https://doi.org/10.1016/j.future.2021.03.010Get rights and content

Highlights

  • Picture information is rapidly obtained through big data image mining.

  • Required pictures are retrieved through corresponding clusters.

  • Outputted values are normalized from the proposed VQA model to the range [0,1].

Abstract

The traditional picture retrieval system has a slow retrieval speed, poor retrieval accuracy, and a low recall when performing massive picture retrieval. In this paper, we design a massive picture retrieval system using the big data image mining technology. It is constructed with data processing layer, business logic layer and presentation layer and works through three steps of data segmentation, mining and merging. For instance, it runs the distributed file system module in a Master/Slave operation mode and designs file read and write requests according to user interaction. Next, it performs parallel computing of picture data sets based on Map Reduce module to solve the picture matching and similarity metrics and returns to the user sorted picture matching result. Then, it extracts the color and texture features of the target area to generate the final picture retrieval result. We select a large number of pictures on a big data platform as simulation test set. The results show that the system we designed has a good retrieval accuracy and a high retrieval speed, which greatly improves the recall of picture retrieval.

Introduction

The forward steps of an era of “Internet+” big data [1] are accelerated with the rapid development of information technology, and pictures gradually replace the text-based web page structure. With the continuous development of mobile Internet equipment and network technology, various pictures used for information communication emerge in endlessly. As a result, the demand of picture retrieval technology [2] grows with each passing day. The big data features high data dimension and massive information, therefore how to rapidly and accurately find the user’s desirable information in the massive picture resources definitely poses great challenges to the retrieval system and gain wide attention of researchers.

In order to efficiently and rapidly retrieve the massive image information, the literature [3] constructs the massive image parallel retrieval method under the BoVW (Bag of Visual Words) model based on the Spark big data platform. In addition, the model is utilized to execute the image pre-processing operation and store the processing results, and the image similarity is matched through the parallel retrieval of the platform. In light of the massive face images, the literature [4] proposes a sort of deep-feature-clustering-based retrieval algorithm, trains the deep convolutional neural network model by categories based on the training set, and utilizes the triplet loss strategy to adjust the network model. In addition, the deep features extracted are divided through the means clustering algorithm and the similarity of deep features is matched. Furthermore, the deep features are integrated through the query expansion technique, realizing the face image retrieval. The literature [5] adopts the visions and texts as the features, proposes a sort of image retrieval method, and integrates the image feature vector after the division of image into text and non-text. In addition, the extracted BoVW is utilized and similar images are retrieved in terms of image, keyword or the combination form.

However, methods provided in the above literatures cannot fully cope with search for pictures by picture, search for texts by picture, or search for pictures by text, which may lead to the worse retrieval precision and longer retrieval time. Therefore, in this paper, the efficient picture information may be rapidly obtained through the big data image mining and the required pictures may be retrieved in the corresponding clusters.

Section snippets

Image segmentation

G=V,E Ec=σ,T Assume that the minimum cut edge [6] set divides the graph G=V,E consisting of vertex set V and edge set E into partitions. Among them, one of the partitions of vertex set Ec=σ,T is the graph existing between it and the minimum cut edge set. In addition, the capacity expression of the cut edge set σ1,σ2,,σn is as shown below: Ec:Capacity=Σuσ,vTIt is hard to merge the mode with the method mentioned in the above expression, which causes the lack of mode. Therefore, it is required

Picture retrieval system framework

In order to realize the timely response and rapid retrieval of massive image resources, the massive picture retrieval system is built at the network layer based on the distributed framework as shown in Fig. 1.

The user in the presentation layer uploads the images through the Internet, and receives the server processing results of business logic layer. According to the user’s retrieval requests, the business logic layer implements the specific business processing. The data processing layer

Retrieval system simulation experiment

Tens of thousands of pictures are extracted from a given big data platform and used as the test set. In addition, the literature [3] system, literature [4] system, literature [5]system, and the proposed system are adopted to carry out the retrieval simulation experiment, and verify the picture retrieval performance of the system.

Conclusion

Although the massive picture information facilitates the communication, it also causes the information redundancy problem and it takes longer time for people to find the desirable pictures. Therefore, in this paper, we construct a massive picture retrieval system based on the big data image mining. The following conclusions have been obtained through the experiments:

(1) When the number of pictures is 6 105, the image retrieval recall of the proposed system is as high as 97%. It demonstrates

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Kun Zhang

References (20)

  • UnarS. et al.

    A decisive content based image retrieval approach for feature fusion in visual and textual images

    Knowl.-Based Syst.

    (2019)
  • ZhengC. et al.

    Learn to cache: Machine learning for network edge caching in the big data era

    IEEE Wirel. Commun.

    (2018)
  • MeiY. et al.

    Review of image retrieval technologies in encrypted domain

    Huanan Ligong Daxue Xuebao J. South China Univ. Technol.

    (2018)
  • JianCao et al.

    Parallel retrieval of massive images based on Apache Spark

    J. Comput. Appl.

    (2018)
  • ZhendongLi et al.

    Massive face image retrieval based on deep feature clustering

    J. Harbin Inst. Technol.

    (2018)
  • BickleAllan et al.

    Minimum edge cuts in diameter 2 graphs

    Discuss. Math. Graph Theory

    (2019)
  • KimY.Y. et al.

    Implementation of hybrid P2P networking distributed web crawler using AWS for smart work news big data

    Peer-to-Peer Netw. Appl.

    (2020)
  • LiJ.Y. et al.

    Prompt image search with deep convolutional neural network via efficient hashing code and addictive latent semantic layer

    J. Int. Technol.

    (2018)
  • AhmedM. et al.

    Infrequent pattern mining in smart healthcare environment using data summarization

    J. Supercomput.

    (2018)
  • SiddiquiI.F. et al.

    Pseudo-cache-based IoT small files management framework in HDFS cluster

    Wirel. Pers. Commun.

    (2020)
There are more references available in the full text version of this article.

Cited by (5)

  • Analysis and experimental research on stability characteristics of squatting posture of wearable lower limb exoskeleton robot

    2021, Future Generation Computer Systems
    Citation Excerpt :

    It can also facilities many computer vision tasks [6], pedestrian re-identification [7], human–computer interfaces [8]. kun zhang et al. [9]design a massive picture retrieval system using the big data image mining technology. It is constructed with data processing layer, business logic layer and presentation layer and works through three steps of data segmentation, mining and merging.

  • A Novel Replication-Less Image Retrieval Method from Cloud Platforms using Divergence Features

    2023, 2023 2nd International Conference on Smart Technologies and Systems for Next Generation Computing, ICSTSN 2023
  • Mining text from natural scene and video images: A survey

    2021, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery

Kun Zhang

Kai Chen

Binghui Fan

View full text