research-article

Content-based retrieval for heterogeneous domains: domain adaptation by relative aggregation points

Authors:
Makoto P. Kato

Kyoto University, Kyoto, Japan

Kyoto University, Kyoto, Japan
View Profile

,
Hiroaki Ohshima

Kyoto University, Kyoto, Japan

Kyoto University, Kyoto, Japan
View Profile

,
Katsumi Tanaka

Kyoto University, Kyoto, Japan

Kyoto University, Kyoto, Japan
View Profile

SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrievalAugust 2012Pages 811–820https://doi.org/10.1145/2348283.2348392

Published:12 August 2012Publication History

SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

Pages 811–820

ABSTRACT

We introduce the problem of domain adaptation for content-based retrieval and propose a domain adaptation method based on relative aggregation points (RAPs). Content-based retrieval including image retrieval and spoken document retrieval enables a user to input examples as a query, and retrieves relevant data based on the similarity to the examples. However, input examples and relevant data can be dissimilar, especially when domains from which the user selects examples and from which the system retrieves data are different. In content-based geographic object retrieval, for example, suppose that a user who lives in Beijing visits Kyoto, Japan, and wants to search for relatively inexpensive restaurants serving popular local dishes by means of a content-based retrieval system. Since such restaurants in Beijing and Kyoto are dissimilar due to the difference in the average cost and areas' popular dishes, it is difficult to find relevant restaurants in Kyoto based on examples selected in Beijing. We propose a solution for this problem by assuming that RAPs in different domains correspond, which may be dissimilar but play the same role. A RAP is defined as the expectation of instances in a domain that are classified into a certain class, e.g. the most expensive restaurant, average restaurant, and restaurant serving the most popular dishes. Our proposed method constructs a new feature space based on RAPs estimated in each domain and bridges the domain difference for improving content-based retrieval in heterogeneous domains. To verify the effectiveness of our proposed method, we evaluated various methods with a test collection developed for content-based geographic object retrieval. Experimental results show that our proposed method achieved significant improvements over baseline methods. Moreover, we observed that the search performance of content-based retrieval in heterogeneous domains was significantly lower than that in homogeneous domains. This finding suggests that relevant data for the same search intent depend on the search context, that is, the location where the user searches and the domain from which the system retrieves data.

References

O. Alonso and R. Baeza-Yates. Design and implementation of relevance assessments using crowdsourcing. In Proc. of ECIR, pages 153--164, 2011. Google ScholarDigital Library
N. J. Belkin. Helping people find what they don't know. Communications of the ACM, 43:58--61, 2000. Google ScholarDigital Library
S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira. Analysis of representations for domain adaptation. In Proc. of NIPS, pages 137--144, 2006.Google Scholar
J. Blitzer, M. Dredze, and F. Pereira. Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification. In Proc. of ACL, pages 440--447, 2007.Google Scholar
J. Blitzer, R. McDonald, and F. Pereira. Domain adaptation with structural correspondence learning. In Proc. of EMNLP, pages 120--128, 2006. Google ScholarDigital Library
C. Buckley and E. M. Voorhees. Retrieval evaluation with incomplete information. In Proc. of SIGIR, pages 25--32, 2004. Google ScholarDigital Library
P. Cai, W. Gao, A. Zhou, and K. Wong. Relevant knowledge helps in choosing right teacher: active query selection for ranking adaptation. In Proc. of SIGIR, pages 115--124, 2011. Google ScholarDigital Library
Y. Chen, X. Zhou, and T. Huang. One-class svm for learning in image retrieval. In Proc. of ICIP, pages 34--37, 2001.Google Scholar
T. Chia, K. Sim, H. Li, and H. Ng. A lattice-based approach to query-by-example spoken document retrieval. In Proc. of SIGIR, pages 363--370, 2008. Google ScholarDigital Library
W. Dai, G.-R. Xue, Q. Yang, and Y. Yu. Co-clustering based classification for out-of-domain documents. In Proc. of KDD, pages 210--219, 2007. Google ScholarDigital Library
W. Gao, P. Cai, K.-F. Wong, and A. Zhou. Learning to rank only using training data from related domain. In Proc. of SIGIR, pages 162--169, 2010. Google ScholarDigital Library
K. J\"arvelin and J. Kek\"al\"ainen. Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information Systems, 20(4):422--446, 2002. Google ScholarDigital Library
T. Joachims. Transductive inference for text classification using support vector machines. In Proc. of ICML, pages 200--209, 1999. Google ScholarDigital Library
M. Kamvar, M. Kellar, R. Patel, and Y. Xu. Computers and iphones and mobile phones, oh my!: a logs-based comparison of search users on different devices. In Proc. of WWW, pages 801--810, 2009. Google ScholarDigital Library
M. P. Kato, H. Ohshima, S. Oyama, and K. Tanaka. Search as if you were in your home town: geographic search by regional context and dynamic feature-space selection. In Proc. of CIKM, pages 1541--1544, 2010. Google ScholarDigital Library
X. Ling, W. Dai, G.-R. Xue, Q. Yang, and Y. Yu. Spectral domain-transfer learning. In Proc. of KDD, pages 488--496, 2008. Google ScholarDigital Library
Y. Liu, D. Zhang, G. Lu, and W. Ma. A survey of content-based image retrieval with high-level semantics. Pattern Recognition, 40(1):262--282, 2007. Google ScholarDigital Library
S. Nakajima and K. Tanaka. Relative queries and the relative cluster-mapping method. In Proc. of DASFAA 2004, pages 843--856, 2004.Google ScholarCross Ref
S. Pan, X. Ni, J. Sun, Q. Yang, and Z. Chen. Cross-domain sentiment classification via spectral feature alignment. In Proc. of WWW, pages 751--760, 2010. Google ScholarDigital Library
S. Pan and Q. Yang. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10):1345--1359, 2010. Google ScholarDigital Library
P. Rai, A. Saha, H. Daumé III, and S. Venkatasubramanian. Domain adaptation meets active learning. In Proc. of the NAACL HLT 2010 Workshop on Active Learning for Natural Language Processing, pages 27--32, 2010. Google ScholarDigital Library
B. Schölkopf, J. Platt, J. Shawe-Taylor, A. Smola, and R. Williamson. Estimating the support of a high-dimensional distribution. Neural computation, 13(7):1443--1471, 2001. Google ScholarDigital Library
F. X. Schumacher and R. W. Eschmeyer. The estimation of fish populations in lakes and ponds. Journal of the Tennessee Academy of Sciences, 18:228--249, 1999.Google Scholar
J. Sim and C. Wright. The kappa statistic in reliability studies: use, interpretation, and sample size requirements. Physical therapy, 85(3):257--268, 2005.Google ScholarCross Ref
J. Teevan, S. Dumais, and E. Horvitz. Potential for personalization. ACM Transactions on Computer-Human Interaction, 17(1):1--31, 2010. Google ScholarDigital Library
B. Wang, J. Tang, W. Fan, S. Chen, Z. Yang, and Y. Liu. Heterogeneous cross domain ranking in latent space. In Proc. CIKM, pages 987--996, 2009. Google ScholarDigital Library
H. Wang, H. Huang, F. Nie, and C. Ding. Cross-language web page classification via dual knowledge transfer using nonnegative matrix tri-factorization. In Proc. of SIGIR, pages 933--942, 2011. Google ScholarDigital Library
G. Xue, W. Dai, Q. Yang, and Y. Yu. Topic-bridged plsa for cross-domain text classification. In Proc. of SIGIR, pages 627--634, 2008. Google ScholarDigital Library

Index Terms

Content-based retrieval for heterogeneous domains: domain adaptation by relative aggregation points
1. Information systems
  1. Information retrieval

Recommendations

IR principles for content-based indexing and retrieval of functional brain images
CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management

In this paper, we explore the concept of a "library of brain images", which implies not only a repository of brain images, but also efficient search and retrieval mechanisms that are based on models derived from IR practice. As a preliminary study, we ...
Read More
Learning Similarity Matching in Multimedia Content-Based Retrieval

Many multimedia content-based retrieval systems allow query formulation with user setting of relative importance of features (e.g., color, texture, shape, etc) to mimic the user's perception of similarity. However, the systems do not modify their ...
Read More
Query Reformulation for Content Based Multimedia Retrieval in MARS
ICMCS '99: Proceedings of the IEEE International Conference on Multimedia Computing and Systems - Volume 2

Unlike traditional database management systems, in content-based multimedia retrieval databases, it is difficult for users to express their exact information need directly in a precise query. A typical interface allows users to express their information ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
August 2012
1236 pages
ISBN:9781450314725
DOI:10.1145/2348283
General Chair:
William Hersh
Oregon Health & Science University, USA
,
Program Chairs:
Jamie Callan
Carnegie Mellon University, USA
,
Yoelle Maarek
Yahoo! Research, Israel
,
Mark Sanderson
Royal Melbourne Institute of Technology, Australia
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 August 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
content-based retrieval
domain adaptation
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 403
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Content-based retrieval for heterogeneous domains: domain adaptation by relative aggregation points

SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

IR principles for content-based indexing and retrieval of functional brain images

Learning Similarity Matching in Multimedia Content-Based Retrieval

Query Reformulation for Content Based Multimedia Retrieval in MARS