Skip to main content
Log in

Indexing and Integrating Multiple Features for WWW Images

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

In this paper, we present a novel indexing technique called Multi-scale Similarity Indexing (MSI) to index image's multi-features into a single one-dimensional structure. Both for text and visual feature spaces, the similarity between a point and a local partition's center in individual space is used as the indexing key, where similarity values in different features are distinguished by different scale. Then a single indexing tree can be built on these keys. Based on the property that relevant images have similar similarity values from the center of the same local partition in any feature space, certain number of irrelevant images can be fast pruned based on the triangle inequity on indexing keys. To remove the “dimensionality curse” existing in high dimensional structure, we propose a new technique called Local Bit Stream (LBS). LBS transforms image's text and visual feature representations into simple, uniform and effective bit stream (BS) representations based on local partition's center. Such BS representations are small in size and fast for comparison since only bit operation are involved. By comparing common bits existing in two BSs, most of irrelevant images can be immediately filtered. To effectively integrate multi-features, we also investigated the following evidence combination techniques—Certainty Factor, Dempster Shafer Theory, Compound Probability, and Linear Combination. Our extensive experiment showed that single one-dimensional index on multi-features improves multi-indices on multi-features greatly. Our LBS method outperforms sequential scan on high dimensional space by an order of magnitude. And Certainty Factor and Dempster Shafer Theory perform best in combining multiple similarities from corresponding multiple features.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aggrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications., Proceedings of the ACM SIGMOD Conference, pp. 94–105 (1998)

  2. Aggrawal, C.C., Wolf, Joel L., Yu, P.S., Procopiuc, C., Park, J.S.: Fast algorithms for projected clustering, Proceedings of the ACM SIGMOD Conference, pp. 61–72 (1999)

  3. Aslandogan, Y.A., Yu, C.T.: Evaluating strategies and systems for content based indexing of person images on the Web., ACM Multimedia, pp. 313–321 (2000)

  4. Cai, D., He, X., Li, Z., Ma, W.-Y., Wen, J.-R.: Hierarchical clustering of WWW image search results using visual, textual and link analysis. ACM Multimedia (2004)

  5. Chakrabart, K., Mehrotra, S.: The hybrid tree: An index structure for high dimensional feature spaces., International Conference on Data Engineering, pp. 322–331 (1999)

  6. Chen, Z., Wenyin, L., Hu, C., Li, M., Zhang, H.: iFind: A web image search engine., SIGIR (2001)

  7. Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware., PODS (2001)

  8. Gaede, V., Gunther, O.: Multidimensional access methods. ACM Comput. Surv. 30(2), 170–231 (1998)

    Article  Google Scholar 

  9. Guntzer, U., Balke, W-T., Kiessling, W.: Optimizing multi-feature queries for image databases., VLDB, pp. 261–281 (2000)

  10. Hinneburg, A., Keim, D.A.: An optimal grid-clustering: towards breaking the curse of diminsionality in high dimensional clustering., VLDB (1999)

  11. Jin, H., Ooi, B.C., Shen, H.T., Yu, C., Zhou, A.: An adaptive and efficient dimensionality reduction algorithm for high-dimensional indexing, ICDE, pp. 87–98 (2003)

  12. Mukherjea, S., Hirata, K., Hara, Y.: Amore: A World Wide Web image retrieval engine. The WWW Journal 2(3), 115–132 (1999)

    Google Scholar 

  13. Ngu, A.H.H., Sheng, Q.Z., Huynh, D.Q., Lei, R.: Combining multi-visual features for efficient indexing in a large image database. VLDB J. 9(4), 279–293 (2001)

    Google Scholar 

  14. Ooi, B.C., Tan, K.L., Yu, C., Bressan, S.: Indexing the edges—a simple and yet efficient approach to high-dimensional indexing., PODS, pp. 166–174 (2000)

  15. A Review of Content-Based Image Retrieval Systems, http://www.jtap.ac.uk/reports/htm/jtap-054.html

  16. Sakurai, Y., Yoshikawa, M., Uemura, S., Kojima, H.: The A-tree: An index structure for high-dimensional spaces using relative approximation., VLDB, pp. 516–526 (2000)

  17. Sclaro, S., Taycher, L., La Cascia, M.: Imagerover: A content-based image browser for the World Wide Web. In Proc. IEEE Workshop on content-based access of image and video libraries (1997)

  18. Shen, H.T, Ooi, B.C., Tan, K.L.: Giving meanings to WWW images. In Proc. of 8th ACM Multimedia Conference, pp. 39–47 (2000)

  19. Shen, H.T., Zhou, X., Cui, B.: Indexing text and visual features for WWW images. To appear in Proceedings of 7th Asia Pacific Web Conference (APWEB 2005)

  20. Shen, H.T., OOi, B.C., Zhou, X., Huang, Z.: Towards effective indexing for very large video sequence database. SIGMOD, pp. 730–741 (2005)

  21. Shortliffe, E.H.: Computer-based medical consultation: MYCIN. Elsevier North-Holland, New York

  22. Smith, J.R., Chang, S.-F.: An Image and video search engine for the World-Wide Web., Proceedings, IS&T/SPIE Symposium on Electronic Imaging: Science and Technology (EI'97)—Storage and Retrieval for Image and Video Databases V (1997)

  23. Sung, K.K., Poggio, T.: Example-based learning for view-based human face detection. PAMI 20(1), 39–51 (1998)

    Google Scholar 

  24. Wang, J.Z., Wiederhold, G., Firscheinvv, O., Wei, S.X.: Content-based image indexing and searching using Daubechies' wavelets. Int. J. Digit. Libr. 1(4), 311–328 (1998)

    Article  MATH  Google Scholar 

  25. Weber, R., Schek, H., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces., VLDB, pp. 194–205 (1998)

  26. Yu, C., Ooi, B.C., Tan, K.L., Jagadish, H.V.: Indexing the distance: An efficient method to KNN processing. VLDB, pp. 421–430 (2001)

  27. Yu, S., Cai, D., Wen, J.-R., Ma, W.-Y.: Improving pseudo-relevance feedback in web information retrieval using web page segmentation., World Wide Web (2003)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Heng Tao Shen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shen, H.T., Zhou, X. & Cui, B. Indexing and Integrating Multiple Features for WWW Images. World Wide Web 9, 343–364 (2006). https://doi.org/10.1007/s11280-006-8560-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-006-8560-4

Categories and Subject Descriptors

Keywords

Navigation