Abstract
The need for academic researchers to retrieve patents and research papers is increasing, because applying for patents is now considered an important research activity. However, retrieving patents using keywords is a laborious task for researchers, because the terms used in patents for the purpose of enlarging the scope of the claims are generally more abstract than those used in research papers. Therefore, we have constructed a framework that facilitates patent retrieval for researchers, and have integrated research papers and patents by analysing the citation relationships between them. We obtained cited research papers in patents using two steps: (1) detection of sentences containing bibliographic information, and (2) extraction of bibliographic information from those sentences. To investigate the effectiveness of our method, we conducted two experiments. In the experiment involving Step 1, we prepared 42,073 sentences, among which a human subject manually identified 1,476 sentences containing citations of papers. For Step 2, we prepared 3,000 sentences, in which the titles, authors, and other bibliographic information were manually identified. We obtained a precision of 91.6%, and a recall of 86.9% in Step 1, and a precision of 86.2% and a recall of 85.1% in Step 2. Finally, we constructed an information retrieval system that provided two methods of retrieving research papers and patents. One method was retrieval by query, and another was from the citation relationships between research papers and patents.
Similar content being viewed by others
References
Baré R.: Results of a statistical study of the references cited in the search reports established by the EPO. World Patent Inf. 3(2), 56–60 (1981)
Borkar, V., Deshmukh, K., Sarawagi, S.: Automatic segmentation of text into structured records. In: Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, pp. 175–186 (2001)
Fujii, A., Iwayama, M., Kando, N.: Overview of patent retrieval task at NTCIR-4. In: Working Notes of the 4th NTCIR Workshop, pp. 225–232 (2004)
Fujii, A., Iwayama, M., Kando, N.: Overview of patent retrieval task at NTCIR-5. In: Proceedings of the 5th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering and Cross-Lingual Information Access, pp. 269–277 (2005)
Fujii, A., Iwayama, M., Kando, N.: Overview of the patent retrieval task at NTCIR-6 workshop. In: Proceedings of the 6th NTCIR Workshop Meeting, pp. 359–365 (2007)
Galhardas, D., Florescu, D., Shasha, D., AJAX: an extensible data cleaning tool. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, p.590 (2000)
Giles, C.L., Bollacker, K., Lawrence, S.: An automatic citation indexing system. In: Proceedings of the 3rd ACM International Conference on Digital Libraries, pp. 89–98 (1998)
Hitchcock, S., Carr, L., Harris, S., Hey, J.M.N., Hall, W.: Citation linking: improving access to online journals. In: Proceedings of the 2nd ACM International Conference on Digital Libraries, pp. 115–122 (1997)
Ikeda, D., Fujiki, T., Okumura, M.: Automatically linking news articles to blog entries. In: Proceedings of AAAI Spring Symposium Series Computational Approaches to Analyzing Weblogs, pp. 78–82 (2006)
Itoh, H., Mano, H., Ogawa, Y.: Term distillation for cross-db retrieval. In: Proceedings of Working Notes of the 3rd NTCIR Workshop Meeting, Part III: Patent Retrieval Task, pp. 11–14 (2002)
Iwayama, M., Fujii, A., Kando, N., Takano, A.: Overview of patent retrieval task at NTCIR-3. In: Proceedings of Working Notes of the 3rd NTCIR Workshop Meeting, Part III: Patent Retrieval Task, pp. 1–10 (2002)
Mase, H., Iwayama, H.: NTCIR-6 patent retrieval experiments at Hitachi. In: Proceedings of the 6th NTCIR Workshop, pp. 403–406 (2007)
Mayer M.: Does science push technology? Patents citing scientific literature. Res. Policy 29, 409–434 (2000)
Nanba, H., Okumura, M.: Towards multi-paper summarization using reference information. In: Proceedings of the 16th IJCAI, pp. 926–931 (1999)
Nanba, H., Kando, N., Okumura, M.: Classification of research papers using citation links and citation types: towards automatic review article generation. In: Proceedings of the American Society for Information Science/the 11th SIG Classification Research Workshop, Classification for User Support and Learning, pp. 117–134 (2000)
Nanba, H., Abekawa, T., Okumura, M., Saito, S.: Bilingual PRESRI: integration of multiple research paper databases. In: Proceedings of RIAO 2004, pp. 195–211 (2004)
Nanno, T., Saito, S., Okumura, M.: Zero-click: a system to support web browsing. In: The 11th International World Wide Web Conference (2002)
Narin F., Olivastro D., Stevens K.A.: Bibliometrics/theory, practice and problems. Evaluat. Rev. 18(1), 65–76 (1994)
Needleman S.B., Wunsch C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Molec. Biol. 48, 443–453 (1970)
Schmoch U., Kirsch N., Lay W., Plescher E., Jung K.O.: Analysis of technical spin-off effects of space-related R&D by means of patent indicators. Acta Astronaut. 24, 353–362 (1991)
Schmoch U.: Tracing the knowledge transfer from science to technology as reflected in patent indicators. Scientometrics 26(1), 193–211 (1993)
Takasu, A.: Bibliographic attribute extraction from erroneous reference based on a statistical model. In: Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries 2003, pp. 49–60 (2003)
Teufel S., Moens M.: Summarizing Scientific articles—experiments with relevance and rhetorical status. Comput. Linguist. 28(4), 409–445 (2002)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Nanba, H., Anzen, N. & Okumura, M. Automatic extraction of citation information in Japanese patent applications. Int J Digit Libr 9, 151–161 (2008). https://doi.org/10.1007/s00799-008-0045-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00799-008-0045-x