Abstract
Large image collections are being used in many modern applications. In this paper, we aim at capturing the intrinsic dissimilarities of image descriptors in large image collections, i.e., to detect dissimilar (or else diverse) images without defining an explicit similarity or distance measure. Towards this goal, we adopt skyline processing techniques for large image databases, based on their high-dimensional descriptor vectors. The novelty of the proposed methodology lies in the use of skyline techniques empowered by state-of-the-art hashing schemes to enable effective data partitioning and indexing in secondary memory, towards supporting large image databases. The proposed approach is evaluated experimentally by using three real-world image datasets. Performance evaluation results demonstrate that images lying on the skyline have significantly different characteristics, which depend on the type of the descriptor. Thus, these skyline items may be used as seeds to apply clustering in large image databases. In addition, we observe that skyline processing using hash-based indexing structures is significantly faster than index-free skyline computation and also more efficient than skyline computation with hierarchical indexing structures. Based on our results, the proposed approach is both efficient (regarding runtime) and effective (with respect to image diversity) and therefore can be used as a base for more complex data mining tasks such as clustering.
Similar content being viewed by others
References
Borzsony, S., Kossmann, D., Stocker, K. (2001). The skyline operator, Proceedings 17th international conference on data engineering (ICDE) pp. 421–430, Heidelberg, Germany.
Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L. (2001). Searching in metric spaces. ACM Computer Surveys, 33(3), 273–321.
Chatzichristofis, S. A., & Boutalis, Y.S. (2008). CEDD: color and edge directivity descriptor – a compact descriptor for image indexing and retrieval, Proceedings 6th international conference in advanced research on computer vision systems (ICVS) pp. 312–322, Santorini, Greece.
Cheng, Y., & Chen, S. (2003). Image classification using color, texture and regions. Image & Vision Computing, 21(9), 759–776.
Drosou, M., & Pitoura, E. (2015). Multiple radii disC diversity: Result diversification based on dissimilarity and coverage. ACM Transactions on Database Systems, 1, 40.
Fagin, R. (1999). Combining fuzzy information from multiple systems. Journal of Computer & System Sciences, 58(1), 83–99.
Georgiadis, N., Tiakas, E., Manolopoulos, Y. (2017). Detecting intrinsic dissimilarities in large image databases through skylines, Proceedings 9th international conference on management of digital ecosystems (MEDES), pp. 194–201, Bangkok, Thailand.
Di Gesu, V., & Starovoitov, V. (1999). Distance-based functions for image comparison. Pattern Recognition Letters, 20(2), 207–214.
Gong, Y., Lazebnik, S., Gordo, A., Perronnin, F. (2013). Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval. IEEE Transactions on Pattern Analysis & Machine Intelligence, 35(12), 2916–2929.
Grauman, K., & Fergus, R. (2013). Learning binary hash codes for large-scale image search, chapter in book machine learning for computer vision by R. cipolla, S. Battiato and G.M. Farinella (eds.), pp. 49–87, Springer.
Heo, J. P., Lee, Y., He, J., Chang, S. F., Yoon, S.E. (2015). Spherical hashing: binary code embedding with hyperspheres. IEEE Transactions on Pattern Analysis & Machine Intelligence, 37(11), 2304–2316.
Indyk, P., & Motwani, R. (1998). Approximate nearest neighbors: Towards removing the curse of dimensionality, Proceedings 30th annual ACM symposium on theory of computing (STOC), pp. 604–613, Dallas, TX.
Jégou, H., Douze, M., Schmid, C. (2008). Hamming embedding and weak geometry consistency for large scale image search, Proceedings 10th European conference on computer vision (ECCV), pp. 304–317, Marseille, France.
Jégou, H., Douze, M., Schmid, C. (2011). Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis & Machine Intelligence, 33(1), 117–128.
Jin, Z., Li, C., Lin, Y., Cai, D. (2014). Density sensitive hashing. IEEE Transactions on Cybernetics, 44(8), 1362–1371.
Kossmann, D., Ramsak, F., Rost, S. (2002). Shooting stars in the sky: An online algorithm for skyline queries, Proceedings 28th international conference on very large data bases (VLDB), pp. 275–286, Hong Kong, China.
Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60, 91–110.
Papadias, D., Tao, Y., Fu, G., Seeger, B. (2003). An optimal and progressive algorithm for skyline queries, Proceedings ACM international conference on management of data (SIGMOD), pp. 467–478, San Diego, CA.
Shirkhorshidi, A. S., Aghabozorgi, S., Wah, T.Y. (2015). A comparison study on similarity and dissimilarity measures in clustering continuous data, PLos ONE, 10(12).
Stehling, R. O., Nascimento, M. A., Falcão, A.X. (2002). A compact and efficient image retrieval approach based on border/interior pixel classification, Proceedings 11th international conference on information & knowledge management (CIKM), pp. 102–109, McLean, VA.
Tan, K. -L., Eng, P. -K., Ooi, B.C. (2001). Efficient progressive skyline computation, Proceedings 27th international conference on very large data bases (VLDB), pp. 301–310, Rome, Italy.
Tiakas, E., Papadopoulos, A.N., Manolopoulos, Y. (2013). On estimating the maximum domination value and the skyline cardinality of multidimensional data sets. International Journal of Knowledge-based Organizations, 3(4), 61–83.
Tiakas, E., Papadopoulos, A. N., Manolopoulos, Y. (2016). Skyline queries: An introduction, Proceedings 6th international conference on information, intelligence, systems & applications (IISA), pp. 1–6, Corfu, Greece.
Tiakas, E., Rafailidis, D., Dimou, A., Daras, P. (2013). MSIDX: Multi-sort indexing for efficient Content-Based image search and retrieval. IEEE Transactions on Multimedia, 15(6), 1415–1430.
Valkanas, G., Papadopoulos, A. N., Gunopoulos, D. (2013). Skydiver: A framework for skyline diversification, Proceedings of joint EDBT/ICDT conferences, pp. 406–417, Genoa, Italy.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Georgiadis, N., Tiakas, E., Manolopoulos, Y. et al. Skyline-based dissimilarity of images. J Intell Inf Syst 53, 509–545 (2019). https://doi.org/10.1007/s10844-019-00571-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-019-00571-y