skip to main content
10.1145/2020408.2020479acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Beyond keyword search: discovering relevant scientific literature

Published:21 August 2011Publication History

ABSTRACT

In scientific research, it is often difficult to express information needs as simple keyword queries. We present a more natural way of searching for relevant scientific literature. Rather than a string of keywords, we define a query as a small set of papers deemed relevant to the research task at hand. By optimizing an objective function based on a fine-grained notion of influence between documents, our approach efficiently selects a set of highly relevant articles. Moreover, as scientists trust some authors more than others, results are personalized to individual preferences. In a user study, researchers found the papers recommended by our method to be more useful, trustworthy and diverse than those selected by popular alternatives, such as Google Scholar and a state-of-the-art topic modeling approach.

References

  1. ACM Digital Library. http://portal.acm.org.Google ScholarGoogle Scholar
  2. R. Adler, J. Ewing, and P. Taylor. Citation statistics. Statistical Science, 24:1--14, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  3. E. M. Airoldi, E. A. Erosheva, S. E. Fienberg, C. Joutard, T. Love, and S. Shringarpure. Reconceptualizing the classification of PNAS articles. Proceedings of the National Academy of Sciences USA, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  4. A.-L. Barabási. On the topology of the scientific collaboration networks. Physica A, 311:590--614, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  5. M. Bates. The design of browsing and berrypicking techniques for the online search interface. Online Review, 13:407--424, 1989.Google ScholarGoogle ScholarCross RefCross Ref
  6. D. M. Blei and J. Lafferty. Dynamic topic models. In ICML, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. M. Blei and J. Lafferty. A correlated topic model of science. Ann. Appl. Stat., 1:17--35, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  8. D. M. Blei and J. Lafferty. Topic Models. Chapman and Hall, 2009.Google ScholarGoogle Scholar
  9. K. Bollacker, S. Lawrence, and C. L. Giles. Discovering relevant scientific literature on the Web. IEEE Intelligent Systems and their Applications, 15:42--47, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Chang and D. M. Blei. Hierarchical relational models for document networks. Annals of Applied Statistics, 4:124--150, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  11. P. Chen, H. Xie, S. Maslov, and S. Redner. Finding scientific gems with Google. Journal of Informetrics, 1:8--15, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  12. D. J. de Solla Price. Networks of scientific papers. Science, 149:510:515, 1965.Google ScholarGoogle ScholarCross RefCross Ref
  13. D. Diderot. In D. Diderot and J. d'Alembert, editors, Encyclopedia, or a systematic dictionary of the sciences, arts and crafts, Paris, 1755. Briasson, David, Le Breton, and Durand. (tr. from French).Google ScholarGoogle Scholar
  14. L. Dietz, S. Bickel, and T. Scheffer. Unsupervised prediction of citation influences. In ICML, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. K. El-Arini and C. Guestrin. Beyond keyword search: Discovering relevant scientific literature. Technical report, Carnegie Mellon University Machine Learning Department, 2011.Google ScholarGoogle Scholar
  16. K. El-Arini, G. Veda, D. Shahaf, and C. Guestrin. Turning down the noise in the blogosphere. In KDD, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. Elkan and K. Noto. Learning classifiers from only positive and unlabeled data. In KDD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. E. A. Erosheva, S. E. Fienberg, and J. Lafferty. Mixed-membership models of scientific publications. Proceedings of the National Academy of Sciences USA, 101:5220--5227, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  19. M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the Internet topology. In SIGCOMM, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. E. Garfield. Citation analysis as a tool in journal evaluation. Science, 178:471--479, 1972.Google ScholarGoogle ScholarCross RefCross Ref
  21. S. Gerrish and D. M. Blei. A language-based approach to measuring scholarly impact. In ICML, 2010.Google ScholarGoogle Scholar
  22. Z. Ghahramani and K. A. Heller. Bayesian sets. In NIPS, 2006.Google ScholarGoogle Scholar
  23. Google Scholar. http://scholar.google.com.Google ScholarGoogle Scholar
  24. T. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences USA, 101:5228--5235, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  25. J. E. Hirsch. An index to quantify an individual's scientific research output. Proceedings of the National Academy of Sciences USA, 102:16569--16572, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  26. S. Khuller, A. Moss, and J. Naor. The budgeted maximum coverage problem. Information Processing Letters, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46:604--632, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. N. Lao and W. W. Cohen. Relational learning using a combination of path-constrained random walks. Machine Learning, 81(1):53--67, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, and N. Glance. Cost-effective outbreak detection in networks. In KDD, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. S. M. McNee, I. Albert, D. Cosley, P. Gopalkrishnan, S. K. Lam, A. M. Rashid, J. A. Konstan, and J. Riedl. On the recommending of citations for research papers. In CSCW, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. G. Nemhauser, L. Wolsey, and M. Fisher. An analysis of the approximations for maximizing submodular set functions. Mathematical Programming, 14:265--294, 1978.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M. E. J. Newman. Scientific collaboration networks: I. network construction and fundamental results. Phys. Rev. E, 64:016131, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  33. M. E. J. Newman. The structure of scientific collaboration networks. Proc. Natl. Acad. Sci. USA, 98:404--409, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  34. C. Olston and E. H. Chi. Scenttrails: Integrating browsing and searching on the Web. ACM Transactions on Computer-Human Interaction, 10:177--197, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: bringing order to the Web. Technical report, Stanford University InfoLab, 1999.Google ScholarGoogle Scholar
  36. S. Pandit and C. Olston. Navigation-aided retrieval. In WWW, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. J. S. Provan and M. O. Ball. The complexity of counting cuts and of computing the probability that a graph is connected. SIAM J. Comput., 12:777--788, 1983.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. F. Radicchi, S. Fortunato, B. Markines, and A. Vespignani. Diffusion of scientific credits and the ranking of scientists. Physical Review E, 80:056103, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  39. S. Redner. How popular is your paper? an empirical study of the citation distribution. Eur. Phys. J. B, 4:131--134, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  40. M. Rosvall and C. T. Bergstrom. Maps of random walks on complex networks reveal community structure. Proc. Natl. Acad. Sci. USA, 105:1118--1123, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  41. M. Rozen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth. The author-topic model for authors and documents. In UAI, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. B. Shaparenko and T. Joachims. Information genealogy: uncovering the flow of ideas in non-hyperlinked document databases. In KDD, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Thomson Reuters Web of Knowledge. http://wokinfo.com/about/whatitis.Google ScholarGoogle Scholar
  44. R. Torres, S. M. McNee, M. Abel, J. A. Konstan, and J. Riedl. Enhancing digital libraries with TechLens. In JCDL, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. L. Valiant. The complexity of enumeration and reliability problems. SIAM J. Comput., 8:410--421, 1979.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Beyond keyword search: discovering relevant scientific literature

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
            August 2011
            1446 pages
            ISBN:9781450308137
            DOI:10.1145/2020408

            Copyright © 2011 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 21 August 2011

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate1,133of8,635submissions,13%

            Upcoming Conference

            KDD '24

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader