research-article

Entity-centric document filtering: boosting feature mapping through meta-features

Authors:
Mianwei Zhou

University of Illinois at Urbana-Champaign, Urbana, IL, USA

University of Illinois at Urbana-Champaign, Urbana, IL, USA
View Profile

,
Kevin Chen-Chuan Chang

University of Illinois at Urbana-Champaign, Urbana, IL, USA

University of Illinois at Urbana-Champaign, Urbana, IL, USA
View Profile

CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge ManagementOctober 2013Pages 119–128https://doi.org/10.1145/2505515.2505683

Published:27 October 2013Publication History

CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management

Pages 119–128

ABSTRACT

This paper studies the entity-centric document filtering task -- given an entity represented by its identification page (e.g., an Wikpedia page), how to correctly identify its relevant documents. In particular, we are interested in learning an entity-centric document filter based on a small number of training entities, and the filter can predict document relevance for a large set of unseen entities at query time. Towards characterizing the relevance of a document, the problem boils down to learning keyword importance for the query entities. Since the same keyword will have very different importance for different entities, we abstract the entity-centric document filtering problem as a transfer learning problem, and the challenge becomes how to appropriately transfer the keyword importance learned from training entities to query entities. Based on the insight that keywords sharing some similar "properties" should have similar importance for their respective entities, we propose a novel concept of meta-feature to map keywords from different entities. To realize the idea of meta-feature-based feature mapping, we develop and contrast two different models, LinearMapping and BoostMapping. Experiments on three different datasets confirm the effectiveness of our proposed models, which show significant improvement compared with four state-of-the-art baseline methods.

References

Trec knowledge base acceleration 2012, http://trec-kba.org/kba-ccr-2012.shtml.Google Scholar
S. Andrews, I. Tsochantaridis, and T. Hofmann. Support vector machines for multiple-instance learning. In Advances in neural information processing systems, pages 561--568, 2002.Google ScholarDigital Library
D. Blei and J. McAuliffe. Supervised topic models. arXiv preprint arXiv:1003.0783, 2010.Google Scholar
J. Blitzer, M. Dredze, and F. Pereira. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Annual Meeting-Association For Computational Linguistics, volume 45, page 440, 2007.Google Scholar
Y. Cao, J. Xu, T.-Y. Liu, H. Li, Y. Huang, and H.-W. Hon. Adapting ranking svm to document retrieval. In Proceedings of the 29th ACM SIGIR conference, pages 186--193. ACM, 2006. Google ScholarDigital Library
W. Dai, G. Xue, Q. Yang, and Y. Yu. Co-clustering based classification for out-of-domain documents. In Proceedings of the 13 th ACM SIGKDD international conference, volume 12, pages 210--219, 2007. Google ScholarDigital Library
A. Evgeniou and M. Pontil. Multi-task feature learning. In Advances in Neural Information Processing Systems 19: Proceedings of the 2006 Conference, volume 19, page 41. MIT Press, 2007.Google Scholar
J. Frank, M. Kleiman-Weiner, D. Roberts, F. Niu, C. Zhang, and R. C. Building an entity-centric stream filtering test collection for trec 2012. In Proceeding of the Twenty-First Text Retrieval Conference, 2012.Google Scholar
J. Friedman. Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4):367--378, 2002. Google ScholarDigital Library
A. Huang. Similarity measures for text document clustering. In Proceedings of the Sixth New Zealand Computer Science Research Student Conference (NZCSRSC2008), Christchurch, New Zealand, pages 49--56, 2008.Google Scholar
T. Joachims. Text categorization with support vector machines: Learning with many relevant features. Machine learning: ECML-98, pages 137--142, 1998. Google ScholarDigital Library
T. Joachims. Making large scale svm learning practical. 1999.Google Scholar
T. Joachims. Training linear svms in linear time. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 217--226. ACM, 2006. Google ScholarDigital Library
G. Kumaran and V. R. Carvalho. Reducing long queries using query quality predictors. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 564--571. ACM, 2009. Google ScholarDigital Library
X. Liu and H. Fang. Entity profile based approach in automatic knowledge finding. In Proceeding of the Twenty-First Text Retrieval Conference, 2012.Google Scholar
S. Pan, J. Kwok, and Q. Yang. Transfer learning via dimensionality reduction. In Proceedings of the 23rd national conference on Artificial intelligence, volume 2, pages 677--682, 2008. Google ScholarDigital Library
S. Pan and Q. Yang. A survey on transfer learning. Knowledge and Data Engineering, IEEE Transactions on, 22(10):1345--1359, 2010. Google ScholarDigital Library
T. Qin, T.-Y. Liu, J. Xu, and H. Li. Letor: A benchmark collection for research on learning to rank for information retrieval. Information Retrieval, 13(4):346--374, 2010. Google ScholarDigital Library
S. E. Robertson, S. Walker, S. Jones, M. M. Hancock-Beaulieu, M. Gatford, et al. Okapi at trec-3. NIST Special Publication SP, pages 109--109, 1995.Google Scholar
L. Weng, Z. Li, R. Cai, Y. Zhang, Y. Zhou, L. Yang, and L. Zhang. Query by document via a decomposition-based two-level retrieval approach. In Proceedings of the 34th international ACM SIGIR conference. ACM, 2011. Google ScholarDigital Library
Y. Yang, N. Bansal, W. Dakka, P. Ipeirotis, N. Koudas, and D. Papadias. Query by document. In Proceedings of the Second ACM International Conference on Web Search and Data Mining, pages 34--43. ACM, 2009. Google ScholarDigital Library
Y. Yang and J. Pedersen. A comparative study on feature selection in text categorization. In Machine Learning Internetional Workshop Then Conference, pages 412--420. Morgan Kaufmann Publishers, Inc., 1997. Google ScholarDigital Library
B. Zadrozny. Learning and evaluating classifiers under sample selection bias. In Proceedings of the twenty-first international conference on Machine learning, page 114. ACM, 2004. Google ScholarDigital Library

Index Terms

Entity-centric document filtering: boosting feature mapping through meta-features
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources
2. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Learning entity-centric document representations using an entity facet topic model
Highlights
- We propose the task of entity-centric document representation learning.
- We ...
Abstract
Learning semantic representations of documents is essential for various downstream applications, including text classification and information retrieval. Entities, as important sources of information, have been playing a crucial role ...
Read More
Entity centric query expansion for enterprise search
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

Enterprise search is important, and the search quality has a direct impact on the productivity of an enterprise. Many information needs of enterprise search center around entities. Intuitively, information related to the entities mentioned in the query, ...
Read More
Exploiting entity relationship for query expansion in enterprise search
Abstract
Enterprise search is important, and the search quality has a direct impact on the productivity of an enterprise. Enterprise data contain both structured and unstructured information. Since these two types of information are complementary and the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management
October 2013
2612 pages
ISBN:9781450322638
DOI:10.1145/2505515
General Chairs:
Qi He
LinkedIn, USA
,
Arun Iyengar
IBM T.J. Watson Research Center, USA
,
Program Chairs:
Wolfgang Nejdl
L3S Research Center, Germany
,
Jian Pei
Simon Fraser University, Canada
,
Rajeev Rastogi
Amazon, India
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 October 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
entity centric
feature mapping
meta feature
transfer learning
Qualifiers
- research-article
Conference

Acceptance Rates
CIKM '13 Paper Acceptance Rate143of848submissions,17%Overall Acceptance Rate1,861of8,427submissions,22%
More
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 13
  Total Citations
  View Citations
- 239
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Entity-centric document filtering: boosting feature mapping through meta-features

CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Learning entity-centric document representations using an entity facet topic model

Entity centric query expansion for enterprise search

Exploiting entity relationship for query expansion in enterprise search