skip to main content
10.1145/2505515.2505683acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Entity-centric document filtering: boosting feature mapping through meta-features

Published:27 October 2013Publication History

ABSTRACT

This paper studies the entity-centric document filtering task -- given an entity represented by its identification page (e.g., an Wikpedia page), how to correctly identify its relevant documents. In particular, we are interested in learning an entity-centric document filter based on a small number of training entities, and the filter can predict document relevance for a large set of unseen entities at query time. Towards characterizing the relevance of a document, the problem boils down to learning keyword importance for the query entities. Since the same keyword will have very different importance for different entities, we abstract the entity-centric document filtering problem as a transfer learning problem, and the challenge becomes how to appropriately transfer the keyword importance learned from training entities to query entities. Based on the insight that keywords sharing some similar "properties" should have similar importance for their respective entities, we propose a novel concept of meta-feature to map keywords from different entities. To realize the idea of meta-feature-based feature mapping, we develop and contrast two different models, LinearMapping and BoostMapping. Experiments on three different datasets confirm the effectiveness of our proposed models, which show significant improvement compared with four state-of-the-art baseline methods.

References

  1. Trec knowledge base acceleration 2012, http://trec-kba.org/kba-ccr-2012.shtml.Google ScholarGoogle Scholar
  2. S. Andrews, I. Tsochantaridis, and T. Hofmann. Support vector machines for multiple-instance learning. In Advances in neural information processing systems, pages 561--568, 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Blei and J. McAuliffe. Supervised topic models. arXiv preprint arXiv:1003.0783, 2010.Google ScholarGoogle Scholar
  4. J. Blitzer, M. Dredze, and F. Pereira. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Annual Meeting-Association For Computational Linguistics, volume 45, page 440, 2007.Google ScholarGoogle Scholar
  5. Y. Cao, J. Xu, T.-Y. Liu, H. Li, Y. Huang, and H.-W. Hon. Adapting ranking svm to document retrieval. In Proceedings of the 29th ACM SIGIR conference, pages 186--193. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. W. Dai, G. Xue, Q. Yang, and Y. Yu. Co-clustering based classification for out-of-domain documents. In Proceedings of the 13 th ACM SIGKDD international conference, volume 12, pages 210--219, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Evgeniou and M. Pontil. Multi-task feature learning. In Advances in Neural Information Processing Systems 19: Proceedings of the 2006 Conference, volume 19, page 41. MIT Press, 2007.Google ScholarGoogle Scholar
  8. J. Frank, M. Kleiman-Weiner, D. Roberts, F. Niu, C. Zhang, and R. C. Building an entity-centric stream filtering test collection for trec 2012. In Proceeding of the Twenty-First Text Retrieval Conference, 2012.Google ScholarGoogle Scholar
  9. J. Friedman. Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4):367--378, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Huang. Similarity measures for text document clustering. In Proceedings of the Sixth New Zealand Computer Science Research Student Conference (NZCSRSC2008), Christchurch, New Zealand, pages 49--56, 2008.Google ScholarGoogle Scholar
  11. T. Joachims. Text categorization with support vector machines: Learning with many relevant features. Machine learning: ECML-98, pages 137--142, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. Joachims. Making large scale svm learning practical. 1999.Google ScholarGoogle Scholar
  13. T. Joachims. Training linear svms in linear time. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 217--226. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G. Kumaran and V. R. Carvalho. Reducing long queries using query quality predictors. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 564--571. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. X. Liu and H. Fang. Entity profile based approach in automatic knowledge finding. In Proceeding of the Twenty-First Text Retrieval Conference, 2012.Google ScholarGoogle Scholar
  16. S. Pan, J. Kwok, and Q. Yang. Transfer learning via dimensionality reduction. In Proceedings of the 23rd national conference on Artificial intelligence, volume 2, pages 677--682, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Pan and Q. Yang. A survey on transfer learning. Knowledge and Data Engineering, IEEE Transactions on, 22(10):1345--1359, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T. Qin, T.-Y. Liu, J. Xu, and H. Li. Letor: A benchmark collection for research on learning to rank for information retrieval. Information Retrieval, 13(4):346--374, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. E. Robertson, S. Walker, S. Jones, M. M. Hancock-Beaulieu, M. Gatford, et al. Okapi at trec-3. NIST Special Publication SP, pages 109--109, 1995.Google ScholarGoogle Scholar
  20. L. Weng, Z. Li, R. Cai, Y. Zhang, Y. Zhou, L. Yang, and L. Zhang. Query by document via a decomposition-based two-level retrieval approach. In Proceedings of the 34th international ACM SIGIR conference. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Y. Yang, N. Bansal, W. Dakka, P. Ipeirotis, N. Koudas, and D. Papadias. Query by document. In Proceedings of the Second ACM International Conference on Web Search and Data Mining, pages 34--43. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Y. Yang and J. Pedersen. A comparative study on feature selection in text categorization. In Machine Learning Internetional Workshop Then Conference, pages 412--420. Morgan Kaufmann Publishers, Inc., 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. B. Zadrozny. Learning and evaluating classifiers under sample selection bias. In Proceedings of the twenty-first international conference on Machine learning, page 114. ACM, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Entity-centric document filtering: boosting feature mapping through meta-features

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management
        October 2013
        2612 pages
        ISBN:9781450322638
        DOI:10.1145/2505515

        Copyright © 2013 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 27 October 2013

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        CIKM '13 Paper Acceptance Rate143of848submissions,17%Overall Acceptance Rate1,861of8,427submissions,22%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader