Skip to main content

Construction of an In-House Paper/Figure Database System Using Portable Document Format Files

  • Conference paper
  • First Online:
Information Search, Integration, and Personalization (ISIP 2018)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1040))

  • 203 Accesses

Abstract

Several general-purpose databases of research papers are available, including ScienceDirect and Google Scholar. However, these systems may not include recent papers from workshops or conferences for special interest groups. Moreover, it can be helpful for researchers in a particular domain to analyze research papers using their own terminology. To support these researchers, we propose a new database system based on the information extracted from portable document format documents that enables annotation of terms via a terminology extraction system. In this paper, we evaluate our system using a use-case experiment and discuss the appropriateness of the system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.sciencedirect.com/.

  2. 2.

    https://scholar.google.com/.

  3. 3.

    http://pdffigures2.allenai.org/.

  4. 4.

    http://www.semanticscholar.org/.

  5. 5.

    ScienceDirect previously had an image-retrieval feature using captions (https://www.elsevier.com/about/press-releases/science-and-technology/elsevier-releases-image-search-new-sciverse-sciencedirect-feature-enables-researchers-to-quickly-find-reliable-visual-content). However, this image-retrieval feature does not exist on the current version.

  6. 6.

    http://gensen.dl.itc.u-tokyo.ac.jp/termextract.html.

  7. 7.

    http://brat.nlplab.org/.

  8. 8.

    https://chem.nlm.nih.gov/chemidplus/chemidlite.jsp.

References

  1. Ware, M., Mabe, M.: The STM report: an overview of scientific and scholarly journal publishing, International Association of Scientific, Technical and Medical Publishers (2015). http://www.stm-assoc.org/2015_02_20_STM_Report_2015.pdf

  2. Clark, C., Divvala, S.: PDFFigures 2.0: mining figures from research papers. In: 2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL), pp. 143–152 (2016)

    Google Scholar 

  3. Nakagawa, H., Mori, T.: A simple but powerful automatic term extraction method. In: COLING-02 on COMPUTERM 2002: Second International Workshop on Computational Terminology - Volume 14. COMPUTERM 2002, Stroudsburg, PA, USA, pp. 1–7, Association for Computational Linguistics (2002)

    Google Scholar 

  4. Blaschke, C., Valencia, A.: Automatic ontology construction from the literature. Genome Inf. 13, 201–213 (2002)

    Google Scholar 

  5. Kageura, K., Yoshioka, M., Koyama, T., Nozue, T., Tsuji, K.: Towards a common testbed for corpus-based computational terminology. In: Computerm 1998, pp. 81–85 (1998)

    Google Scholar 

  6. Yoshioka, M., Zhu, T., Hara, S.: A multi-faceted figure retrieval system from research papers for supporting nano-crystal device development researchers. In: The Proceedings of the First International Workshop on Scientific Document Analysis (SCIDOCA 2016), The Japanese Society of Artificial Intelligence (2016). Short paper 2

    Google Scholar 

  7. Rocktäschel, T., Weidlich, M., Leser, U.: ChemSpot: a hybrid system for chemical named entity recognition. Bioinformatics 28, 1633–1640 (2012)

    Article  Google Scholar 

  8. Jessop, D., Adams, S., Willighagen, E., Hawizy, L., Murray-Rust, P.: OSCAR4: a flexible architecture for chemical text-mining. J. Cheminform. 3, 41 (2011)

    Article  Google Scholar 

  9. Dieb, T.M., Yoshioka, M.: Extraction of chemical and drug named entities by ensemble learning using chemical ner tools based on different extraction guidelines. Trans. Mach. Learn. Data Min. 8, 61–76 (2015)

    Google Scholar 

  10. Krallinger, M., et al.: The chemdner corpus of chemicals and drugs and its annotation principles. J. Cheminform. 7, S2 (2015)

    Article  Google Scholar 

  11. Dieb, T.M., Yoshioka, M., Hara, S.: An annotated corpus to support information extraction from research papers on nanocrystal devices. J. Inf. Process. 24, 554–564 (2016)

    Google Scholar 

  12. Charbonnier, J., Sohmen, L., Rothman, J., Rohden, B., Wartena, C.: NOA: a search engine for reusable scientific images beyond the life sciences. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds.) ECIR 2018. LNCS, vol. 10772, pp. 797–800. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-76941-7_78

    Chapter  Google Scholar 

Download references

Acknowledgments

This research was partly supported by ROIS NII Open Collaborative Research 2018-24.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Masaharu Yoshioka .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yoshioka, M., Hara, S. (2019). Construction of an In-House Paper/Figure Database System Using Portable Document Format Files. In: Kotzinos, D., Laurent, D., Spyratos, N., Tanaka, Y., Taniguchi, Ri. (eds) Information Search, Integration, and Personalization. ISIP 2018. Communications in Computer and Information Science, vol 1040. Springer, Cham. https://doi.org/10.1007/978-3-030-30284-9_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30284-9_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30283-2

  • Online ISBN: 978-3-030-30284-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics