Abstract
Several general-purpose databases of research papers are available, including ScienceDirect and Google Scholar. However, these systems may not include recent papers from workshops or conferences for special interest groups. Moreover, it can be helpful for researchers in a particular domain to analyze research papers using their own terminology. To support these researchers, we propose a new database system based on the information extracted from portable document format documents that enables annotation of terms via a terminology extraction system. In this paper, we evaluate our system using a use-case experiment and discuss the appropriateness of the system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
ScienceDirect previously had an image-retrieval feature using captions (https://www.elsevier.com/about/press-releases/science-and-technology/elsevier-releases-image-search-new-sciverse-sciencedirect-feature-enables-researchers-to-quickly-find-reliable-visual-content). However, this image-retrieval feature does not exist on the current version.
- 6.
- 7.
- 8.
References
Ware, M., Mabe, M.: The STM report: an overview of scientific and scholarly journal publishing, International Association of Scientific, Technical and Medical Publishers (2015). http://www.stm-assoc.org/2015_02_20_STM_Report_2015.pdf
Clark, C., Divvala, S.: PDFFigures 2.0: mining figures from research papers. In: 2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL), pp. 143–152 (2016)
Nakagawa, H., Mori, T.: A simple but powerful automatic term extraction method. In: COLING-02 on COMPUTERM 2002: Second International Workshop on Computational Terminology - Volume 14. COMPUTERM 2002, Stroudsburg, PA, USA, pp. 1–7, Association for Computational Linguistics (2002)
Blaschke, C., Valencia, A.: Automatic ontology construction from the literature. Genome Inf. 13, 201–213 (2002)
Kageura, K., Yoshioka, M., Koyama, T., Nozue, T., Tsuji, K.: Towards a common testbed for corpus-based computational terminology. In: Computerm 1998, pp. 81–85 (1998)
Yoshioka, M., Zhu, T., Hara, S.: A multi-faceted figure retrieval system from research papers for supporting nano-crystal device development researchers. In: The Proceedings of the First International Workshop on Scientific Document Analysis (SCIDOCA 2016), The Japanese Society of Artificial Intelligence (2016). Short paper 2
Rocktäschel, T., Weidlich, M., Leser, U.: ChemSpot: a hybrid system for chemical named entity recognition. Bioinformatics 28, 1633–1640 (2012)
Jessop, D., Adams, S., Willighagen, E., Hawizy, L., Murray-Rust, P.: OSCAR4: a flexible architecture for chemical text-mining. J. Cheminform. 3, 41 (2011)
Dieb, T.M., Yoshioka, M.: Extraction of chemical and drug named entities by ensemble learning using chemical ner tools based on different extraction guidelines. Trans. Mach. Learn. Data Min. 8, 61–76 (2015)
Krallinger, M., et al.: The chemdner corpus of chemicals and drugs and its annotation principles. J. Cheminform. 7, S2 (2015)
Dieb, T.M., Yoshioka, M., Hara, S.: An annotated corpus to support information extraction from research papers on nanocrystal devices. J. Inf. Process. 24, 554–564 (2016)
Charbonnier, J., Sohmen, L., Rothman, J., Rohden, B., Wartena, C.: NOA: a search engine for reusable scientific images beyond the life sciences. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds.) ECIR 2018. LNCS, vol. 10772, pp. 797–800. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-76941-7_78
Acknowledgments
This research was partly supported by ROIS NII Open Collaborative Research 2018-24.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Yoshioka, M., Hara, S. (2019). Construction of an In-House Paper/Figure Database System Using Portable Document Format Files. In: Kotzinos, D., Laurent, D., Spyratos, N., Tanaka, Y., Taniguchi, Ri. (eds) Information Search, Integration, and Personalization. ISIP 2018. Communications in Computer and Information Science, vol 1040. Springer, Cham. https://doi.org/10.1007/978-3-030-30284-9_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-30284-9_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30283-2
Online ISBN: 978-3-030-30284-9
eBook Packages: Computer ScienceComputer Science (R0)