skip to main content
10.1145/2020408.2020479acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Beyond keyword search: discovering relevant scientific literature

Published: 21 August 2011 Publication History

Abstract

In scientific research, it is often difficult to express information needs as simple keyword queries. We present a more natural way of searching for relevant scientific literature. Rather than a string of keywords, we define a query as a small set of papers deemed relevant to the research task at hand. By optimizing an objective function based on a fine-grained notion of influence between documents, our approach efficiently selects a set of highly relevant articles. Moreover, as scientists trust some authors more than others, results are personalized to individual preferences. In a user study, researchers found the papers recommended by our method to be more useful, trustworthy and diverse than those selected by popular alternatives, such as Google Scholar and a state-of-the-art topic modeling approach.

References

[1]
ACM Digital Library. http://portal.acm.org.
[2]
R. Adler, J. Ewing, and P. Taylor. Citation statistics. Statistical Science, 24:1--14, 2009.
[3]
E. M. Airoldi, E. A. Erosheva, S. E. Fienberg, C. Joutard, T. Love, and S. Shringarpure. Reconceptualizing the classification of PNAS articles. Proceedings of the National Academy of Sciences USA, 2010.
[4]
A.-L. Barabási. On the topology of the scientific collaboration networks. Physica A, 311:590--614, 2002.
[5]
M. Bates. The design of browsing and berrypicking techniques for the online search interface. Online Review, 13:407--424, 1989.
[6]
D. M. Blei and J. Lafferty. Dynamic topic models. In ICML, 2006.
[7]
D. M. Blei and J. Lafferty. A correlated topic model of science. Ann. Appl. Stat., 1:17--35, 2007.
[8]
D. M. Blei and J. Lafferty. Topic Models. Chapman and Hall, 2009.
[9]
K. Bollacker, S. Lawrence, and C. L. Giles. Discovering relevant scientific literature on the Web. IEEE Intelligent Systems and their Applications, 15:42--47, 2000.
[10]
J. Chang and D. M. Blei. Hierarchical relational models for document networks. Annals of Applied Statistics, 4:124--150, 2010.
[11]
P. Chen, H. Xie, S. Maslov, and S. Redner. Finding scientific gems with Google. Journal of Informetrics, 1:8--15, 2007.
[12]
D. J. de Solla Price. Networks of scientific papers. Science, 149:510:515, 1965.
[13]
D. Diderot. In D. Diderot and J. d'Alembert, editors, Encyclopedia, or a systematic dictionary of the sciences, arts and crafts, Paris, 1755. Briasson, David, Le Breton, and Durand. (tr. from French).
[14]
L. Dietz, S. Bickel, and T. Scheffer. Unsupervised prediction of citation influences. In ICML, 2007.
[15]
K. El-Arini and C. Guestrin. Beyond keyword search: Discovering relevant scientific literature. Technical report, Carnegie Mellon University Machine Learning Department, 2011.
[16]
K. El-Arini, G. Veda, D. Shahaf, and C. Guestrin. Turning down the noise in the blogosphere. In KDD, 2009.
[17]
C. Elkan and K. Noto. Learning classifiers from only positive and unlabeled data. In KDD, 2008.
[18]
E. A. Erosheva, S. E. Fienberg, and J. Lafferty. Mixed-membership models of scientific publications. Proceedings of the National Academy of Sciences USA, 101:5220--5227, 2004.
[19]
M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the Internet topology. In SIGCOMM, 1999.
[20]
E. Garfield. Citation analysis as a tool in journal evaluation. Science, 178:471--479, 1972.
[21]
S. Gerrish and D. M. Blei. A language-based approach to measuring scholarly impact. In ICML, 2010.
[22]
Z. Ghahramani and K. A. Heller. Bayesian sets. In NIPS, 2006.
[23]
Google Scholar. http://scholar.google.com.
[24]
T. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences USA, 101:5228--5235, 2004.
[25]
J. E. Hirsch. An index to quantify an individual's scientific research output. Proceedings of the National Academy of Sciences USA, 102:16569--16572, 2005.
[26]
S. Khuller, A. Moss, and J. Naor. The budgeted maximum coverage problem. Information Processing Letters, 1999.
[27]
J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46:604--632, 1999.
[28]
N. Lao and W. W. Cohen. Relational learning using a combination of path-constrained random walks. Machine Learning, 81(1):53--67, 2010.
[29]
J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, and N. Glance. Cost-effective outbreak detection in networks. In KDD, 2007.
[30]
S. M. McNee, I. Albert, D. Cosley, P. Gopalkrishnan, S. K. Lam, A. M. Rashid, J. A. Konstan, and J. Riedl. On the recommending of citations for research papers. In CSCW, 2002.
[31]
G. Nemhauser, L. Wolsey, and M. Fisher. An analysis of the approximations for maximizing submodular set functions. Mathematical Programming, 14:265--294, 1978.
[32]
M. E. J. Newman. Scientific collaboration networks: I. network construction and fundamental results. Phys. Rev. E, 64:016131, 2001.
[33]
M. E. J. Newman. The structure of scientific collaboration networks. Proc. Natl. Acad. Sci. USA, 98:404--409, 2001.
[34]
C. Olston and E. H. Chi. Scenttrails: Integrating browsing and searching on the Web. ACM Transactions on Computer-Human Interaction, 10:177--197, 2003.
[35]
L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: bringing order to the Web. Technical report, Stanford University InfoLab, 1999.
[36]
S. Pandit and C. Olston. Navigation-aided retrieval. In WWW, 2007.
[37]
J. S. Provan and M. O. Ball. The complexity of counting cuts and of computing the probability that a graph is connected. SIAM J. Comput., 12:777--788, 1983.
[38]
F. Radicchi, S. Fortunato, B. Markines, and A. Vespignani. Diffusion of scientific credits and the ranking of scientists. Physical Review E, 80:056103, 2009.
[39]
S. Redner. How popular is your paper? an empirical study of the citation distribution. Eur. Phys. J. B, 4:131--134, 1998.
[40]
M. Rosvall and C. T. Bergstrom. Maps of random walks on complex networks reveal community structure. Proc. Natl. Acad. Sci. USA, 105:1118--1123, 2008.
[41]
M. Rozen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth. The author-topic model for authors and documents. In UAI, 2004.
[42]
B. Shaparenko and T. Joachims. Information genealogy: uncovering the flow of ideas in non-hyperlinked document databases. In KDD, 2007.
[43]
Thomson Reuters Web of Knowledge. http://wokinfo.com/about/whatitis.
[44]
R. Torres, S. M. McNee, M. Abel, J. A. Konstan, and J. Riedl. Enhancing digital libraries with TechLens. In JCDL, 2004.
[45]
L. Valiant. The complexity of enumeration and reliability problems. SIAM J. Comput., 8:410--421, 1979.

Cited By

View all
  • (2024)Python Driven Keyword Analysis for SEO Optimization2024 10th International Conference on Advanced Computing and Communication Systems (ICACCS)10.1109/ICACCS60874.2024.10717132(1170-1176)Online publication date: 14-Mar-2024
  • (2024)Submodular Maximization Subject to Uniform and Partition Matroids: From Theory to Practical Applications and Distributed SolutionsReference Module in Materials Science and Materials Engineering10.1016/B978-0-443-14081-5.00090-8Online publication date: 2024
  • (2024)Improved Streaming Algorithm for Minimum Cost Submodular Cover ProblemComputational Data and Social Networks10.1007/978-981-97-0669-3_21(222-233)Online publication date: 29-Feb-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2011
1446 pages
ISBN:9781450308137
DOI:10.1145/2020408
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 August 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. citation analysis
  2. personalization

Qualifiers

  • Research-article

Conference

KDD '11
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)64
  • Downloads (Last 6 weeks)4
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Python Driven Keyword Analysis for SEO Optimization2024 10th International Conference on Advanced Computing and Communication Systems (ICACCS)10.1109/ICACCS60874.2024.10717132(1170-1176)Online publication date: 14-Mar-2024
  • (2024)Submodular Maximization Subject to Uniform and Partition Matroids: From Theory to Practical Applications and Distributed SolutionsReference Module in Materials Science and Materials Engineering10.1016/B978-0-443-14081-5.00090-8Online publication date: 2024
  • (2024)Improved Streaming Algorithm for Minimum Cost Submodular Cover ProblemComputational Data and Social Networks10.1007/978-981-97-0669-3_21(222-233)Online publication date: 29-Feb-2024
  • (2023)Dynamic non-monotone submodular maximizationProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666883(17369-17382)Online publication date: 10-Dec-2023
  • (2023)Fairness in streaming submodular maximization over a matroid constraintProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618775(9150-9171)Online publication date: 23-Jul-2023
  • (2023)Hierarchical Transformer-based Query by Multiple DocumentsProceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3578337.3605130(105-115)Online publication date: 9-Aug-2023
  • (2023)Algorithms for Cardinality-Constrained Monotone DR-Submodular Maximization with Low Adaptivity and Query ComplexityJournal of Optimization Theory and Applications10.1007/s10957-023-02353-7200:1(194-214)Online publication date: 18-Dec-2023
  • (2023)Group fairness in non-monotone submodular maximizationJournal of Combinatorial Optimization10.1007/s10878-023-01019-445:3Online publication date: 25-Mar-2023
  • (2022)Fast Algorithm for Big Data Summarization with Knapsack and Partition Matroid Constraints2022 International Conference on INnovations in Intelligent SysTems and Applications (INISTA)10.1109/INISTA55318.2022.9894252(1-6)Online publication date: 8-Aug-2022
  • (2022)Research Concept Link Prediction via Graph Convolutional Network2022 8th International Conference on Big Data and Information Analytics (BigDIA)10.1109/BigDIA56350.2022.9874237(220-225)Online publication date: 24-Aug-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media