Massive Scientific Paper Mining: Modeling, Design and Implementation

Zhou, Yang; Ji, Shufan; Xu, Ke

doi:10.1007/978-3-642-41491-6_32

Yang Zhou²³,
Shufan Ji²³ &
Ke Xu²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8202))

Included in the following conference series:

1608 Accesses

Abstract

With dramatic increasing of scientific research papers, scientific paper mining systems have become more popular for efficient paper retrieval and analysis. However, existing keyword based search engines, language or topic model based mining systems cannot provide customized queries according to various user requirements. Hence, in this paper, we are motivated to propose a novel TAIL (Time-Author-Institute-Literature) model to capture the relationships among literature, authors, institutes and time stamps. Based on the TAIL model, we implement the Massive Scientific Paper Mining (MSPM) system and set up a B/S (Browser/Server) structure for web services. The evaluation results on large real data show that our MSPM system could deliver desirable mining results, providing valuable data supports for scientific research cooperations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: Proceedings of the 24th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 334–342. ACM (2001)
Google Scholar
Google scholar, http://scholar.google.com/
Microsoft academic search, http://academic.research.microsoft.com/
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
MATH Google Scholar
Steyvers, M., Smyth, P., Rosen-Zvi, M., Griffiths, T.: Probabilistic author-topic models for information discovery. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 306–315. ACM (2004)
Google Scholar
Wang, X., McCallum, A.: Topics over time: a non-markov continuous-time model of topical trends. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 424–433. ACM (2006)
Google Scholar
Tang, J., Wang, B., Yang, Y., Hu, P., Zhao, Y., Yan, X., Gao, B., Huang, M., Xu, P., Li, W., Usadi, A.K.: Patentminer: topic-driven patent analysis and mining. In: Proceedings of the 18th ACM SIGKDD, pp. 1366–1374. ACM (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

State Key Lab. of Software Development Environment, Beihang University, Beijing, 100191, P.R. China
Yang Zhou, Shufan Ji & Ke Xu

Authors

Yang Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Shufan Ji
View author publications
You can also search for this author in PubMed Google Scholar
Ke Xu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Technology, Tsinghua University, 100084, Beijing, China
Maosong Sun
Horizon Doctoral Training Centre, School of Computer Science, University of Nottingham, NG8 1BB, Nottingham, UK
Min Zhang
Google Inc., Mountain View, CA, USA
Dekang Lin
Baidu Inc., Beijing, China
Haifeng Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, Y., Ji, S., Xu, K. (2013). Massive Scientific Paper Mining: Modeling, Design and Implementation. In: Sun, M., Zhang, M., Lin, D., Wang, H. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. NLP-NABD CCL 2013 2013. Lecture Notes in Computer Science(), vol 8202. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41491-6_32

Download citation

DOI: https://doi.org/10.1007/978-3-642-41491-6_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41490-9
Online ISBN: 978-3-642-41491-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics