ABSTRACT
In scientific research, it is often difficult to express information needs as simple keyword queries. We present a more natural way of searching for relevant scientific literature. Rather than a string of keywords, we define a query as a small set of papers deemed relevant to the research task at hand. By optimizing an objective function based on a fine-grained notion of influence between documents, our approach efficiently selects a set of highly relevant articles. Moreover, as scientists trust some authors more than others, results are personalized to individual preferences. In a user study, researchers found the papers recommended by our method to be more useful, trustworthy and diverse than those selected by popular alternatives, such as Google Scholar and a state-of-the-art topic modeling approach.
- ACM Digital Library. http://portal.acm.org.Google Scholar
- R. Adler, J. Ewing, and P. Taylor. Citation statistics. Statistical Science, 24:1--14, 2009.Google ScholarCross Ref
- E. M. Airoldi, E. A. Erosheva, S. E. Fienberg, C. Joutard, T. Love, and S. Shringarpure. Reconceptualizing the classification of PNAS articles. Proceedings of the National Academy of Sciences USA, 2010.Google ScholarCross Ref
- A.-L. Barabási. On the topology of the scientific collaboration networks. Physica A, 311:590--614, 2002.Google ScholarCross Ref
- M. Bates. The design of browsing and berrypicking techniques for the online search interface. Online Review, 13:407--424, 1989.Google ScholarCross Ref
- D. M. Blei and J. Lafferty. Dynamic topic models. In ICML, 2006. Google ScholarDigital Library
- D. M. Blei and J. Lafferty. A correlated topic model of science. Ann. Appl. Stat., 1:17--35, 2007.Google ScholarCross Ref
- D. M. Blei and J. Lafferty. Topic Models. Chapman and Hall, 2009.Google Scholar
- K. Bollacker, S. Lawrence, and C. L. Giles. Discovering relevant scientific literature on the Web. IEEE Intelligent Systems and their Applications, 15:42--47, 2000. Google ScholarDigital Library
- J. Chang and D. M. Blei. Hierarchical relational models for document networks. Annals of Applied Statistics, 4:124--150, 2010.Google ScholarCross Ref
- P. Chen, H. Xie, S. Maslov, and S. Redner. Finding scientific gems with Google. Journal of Informetrics, 1:8--15, 2007.Google ScholarCross Ref
- D. J. de Solla Price. Networks of scientific papers. Science, 149:510:515, 1965.Google ScholarCross Ref
- D. Diderot. In D. Diderot and J. d'Alembert, editors, Encyclopedia, or a systematic dictionary of the sciences, arts and crafts, Paris, 1755. Briasson, David, Le Breton, and Durand. (tr. from French).Google Scholar
- L. Dietz, S. Bickel, and T. Scheffer. Unsupervised prediction of citation influences. In ICML, 2007. Google ScholarDigital Library
- K. El-Arini and C. Guestrin. Beyond keyword search: Discovering relevant scientific literature. Technical report, Carnegie Mellon University Machine Learning Department, 2011.Google Scholar
- K. El-Arini, G. Veda, D. Shahaf, and C. Guestrin. Turning down the noise in the blogosphere. In KDD, 2009. Google ScholarDigital Library
- C. Elkan and K. Noto. Learning classifiers from only positive and unlabeled data. In KDD, 2008. Google ScholarDigital Library
- E. A. Erosheva, S. E. Fienberg, and J. Lafferty. Mixed-membership models of scientific publications. Proceedings of the National Academy of Sciences USA, 101:5220--5227, 2004.Google ScholarCross Ref
- M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the Internet topology. In SIGCOMM, 1999. Google ScholarDigital Library
- E. Garfield. Citation analysis as a tool in journal evaluation. Science, 178:471--479, 1972.Google ScholarCross Ref
- S. Gerrish and D. M. Blei. A language-based approach to measuring scholarly impact. In ICML, 2010.Google Scholar
- Z. Ghahramani and K. A. Heller. Bayesian sets. In NIPS, 2006.Google Scholar
- Google Scholar. http://scholar.google.com.Google Scholar
- T. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences USA, 101:5228--5235, 2004.Google ScholarCross Ref
- J. E. Hirsch. An index to quantify an individual's scientific research output. Proceedings of the National Academy of Sciences USA, 102:16569--16572, 2005.Google ScholarCross Ref
- S. Khuller, A. Moss, and J. Naor. The budgeted maximum coverage problem. Information Processing Letters, 1999. Google ScholarDigital Library
- J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46:604--632, 1999. Google ScholarDigital Library
- N. Lao and W. W. Cohen. Relational learning using a combination of path-constrained random walks. Machine Learning, 81(1):53--67, 2010. Google ScholarDigital Library
- J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, and N. Glance. Cost-effective outbreak detection in networks. In KDD, 2007. Google ScholarDigital Library
- S. M. McNee, I. Albert, D. Cosley, P. Gopalkrishnan, S. K. Lam, A. M. Rashid, J. A. Konstan, and J. Riedl. On the recommending of citations for research papers. In CSCW, 2002. Google ScholarDigital Library
- G. Nemhauser, L. Wolsey, and M. Fisher. An analysis of the approximations for maximizing submodular set functions. Mathematical Programming, 14:265--294, 1978.Google ScholarDigital Library
- M. E. J. Newman. Scientific collaboration networks: I. network construction and fundamental results. Phys. Rev. E, 64:016131, 2001.Google ScholarCross Ref
- M. E. J. Newman. The structure of scientific collaboration networks. Proc. Natl. Acad. Sci. USA, 98:404--409, 2001.Google ScholarCross Ref
- C. Olston and E. H. Chi. Scenttrails: Integrating browsing and searching on the Web. ACM Transactions on Computer-Human Interaction, 10:177--197, 2003. Google ScholarDigital Library
- L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: bringing order to the Web. Technical report, Stanford University InfoLab, 1999.Google Scholar
- S. Pandit and C. Olston. Navigation-aided retrieval. In WWW, 2007. Google ScholarDigital Library
- J. S. Provan and M. O. Ball. The complexity of counting cuts and of computing the probability that a graph is connected. SIAM J. Comput., 12:777--788, 1983.Google ScholarDigital Library
- F. Radicchi, S. Fortunato, B. Markines, and A. Vespignani. Diffusion of scientific credits and the ranking of scientists. Physical Review E, 80:056103, 2009.Google ScholarCross Ref
- S. Redner. How popular is your paper? an empirical study of the citation distribution. Eur. Phys. J. B, 4:131--134, 1998.Google ScholarCross Ref
- M. Rosvall and C. T. Bergstrom. Maps of random walks on complex networks reveal community structure. Proc. Natl. Acad. Sci. USA, 105:1118--1123, 2008.Google ScholarCross Ref
- M. Rozen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth. The author-topic model for authors and documents. In UAI, 2004. Google ScholarDigital Library
- B. Shaparenko and T. Joachims. Information genealogy: uncovering the flow of ideas in non-hyperlinked document databases. In KDD, 2007. Google ScholarDigital Library
- Thomson Reuters Web of Knowledge. http://wokinfo.com/about/whatitis.Google Scholar
- R. Torres, S. M. McNee, M. Abel, J. A. Konstan, and J. Riedl. Enhancing digital libraries with TechLens. In JCDL, 2004. Google ScholarDigital Library
- L. Valiant. The complexity of enumeration and reliability problems. SIAM J. Comput., 8:410--421, 1979.Google ScholarDigital Library
Index Terms
- Beyond keyword search: discovering relevant scientific literature
Recommendations
An Architecture of an Academic Search Engine with Personalized Search Result Ranking Mechanism
ICNCC '16: Proceedings of the Fifth International Conference on Network, Communication and ComputingA rapid increasing of information on the Internet and World Wide Web causes information overloaded problem. Thus, search engines become important tools to help WWW users to discover the information they need. With an exponentially increasing of ...
Concept Based Personalized Search and Collaborative Search Using Modified HITS Algorithm
MIKE 2013: Proceedings of the First International Conference on Mining Intelligence and Knowledge Exploration - Volume 8284Keyword based search is commonly used by popular search engines. The major problem with this kind of search is that we do not get user intended results for the search. In addition, every user gets the same set of results for the same query whereas, ...
Web search using dynamic keyword suggestion
Web search has become an essential task for most people. As the Web grows rapidly, effective searches have grown increasingly important. Most of us, however, have experienced frustration in trying to search for something on the Web. In existing keyword-...
Comments