A Manually Annotated Corpus of Pharmaceutical Patents

Kiss, Márton; Nagy, Ágoston; Vincze, Veronika; Almási, Attila; Alexin, Zoltán; Csirik, János

doi:10.1007/978-3-642-32790-2_16

Márton Kiss²¹,
Ágoston Nagy²¹,
Veronika Vincze^22,23,
Attila Almási²¹,
Zoltán Alexin²⁴ &
…
János Csirik²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7499))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

1721 Accesses

Abstract

The language of patent claims differs from ordinary language to a great extent, which results in the fact that tools especially adapted to patent language are needed in patent processing. In order to evaluate these tools, manually annotated patent corpora are necessary. Thus, we constructed a corpus of English language pharmaceutical patents belonging to the class A61K, on which several layers of manual annotation (such as named entities, keys, NucleusNPs, quantitative expressions, heads and complements, perdurants) were carried out and on which tools for patent processing can be evaluated.

This work was supported in part by the National Innovation Office of the Hungarian government within the framework of the project MASZEKER.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Development of Text Mining Tools for Information Retrieval from Patents

Recognizing chemicals in patents: a comparative analysis

Article Open access 28 October 2016

The Portability of Three Types of Text Mining Techniques into the Patent Text Genre

References

Osenga, K.: Linguistics and Patent Claim Construction. Rutgers Law Journal 38, 61–108 (2006)
Google Scholar
Shinmori, A., Okumura, M., Marukawa, Y., Iwayama, M.: Patent Claim Processing for Readability – Structure Analysis and Term Explanation. In: Proceedings of the ACL Workshop on Patent Corpus Processing, pp. 56–65. Association for Computational Linguistics, Sapporo (2003)
Google Scholar
Verberne, S., D’hondt, E., Oostdijk, N., Koster, C.H.: Quantifying the Challenges in Parsing Patent Claims. In: Proceedings of the 1st International Workshop on Advances in Patent Information Retrieval (AsPIRe 2010), pp. 14–21 (2010)
Google Scholar
Farkas, R., Vincze, V., Móra, Gy., Csirik, J., Szarvas, Gy.: The CoNLL 2010 Shared Task: Learning to Detect Hedges and their Scope in Natural Language Text. In: Proceedings of the Fourteenth Conference on Computational Natural Language Learning (CoNLL 2010): Shared Task, pp. 1–12. Association for Computational Linguistics (2010)
Google Scholar
Täger, W.: The Sentence-Aligned European Patent Corpus. In: Proceedings of the 15th Conference of the European Association for Machine Translation, Leuven, Belgium, pp. 177–184 (2011)
Google Scholar
Utiyama, M., Isahara, H.: A Japanese-English Patent Parallel Corpus. In: MT Summit XI, pp. 475–482 (2007)
Google Scholar
Lu, B., Tsou, B.K., Tao, J., Kwong, O.Y., Zhu, J.: Mining Large-scale Parallel Corpora from Multilingual Patents: An English-Chinese example and its application to SMT. In: Proceedings of CIPS-SIGHAN Joint Conference on Chinese Language Processing, pp. 79–86. Chinese Information Processing Society of China, Beijing (2010)
Google Scholar
Haegeman, L.M.V., Guéron, J.: English grammar: a generative perspective. Blackwell, Oxford (1999)
Google Scholar
Gangemi, A., Guarino, N., Masolo, C., Oltramari, A., Schneider, L.: Sweetening Ontologies with DOLCE. In: Gómez-Pérez, A., Benjamins, V.R. (eds.) EKAW 2002. LNCS (LNAI), vol. 2473, pp. 166–181. Springer, Heidelberg (2002)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Informatics, University of Szeged, 6720, Szeged, Árpád tér 2., Hungary
Márton Kiss, Ágoston Nagy & Attila Almási
MTA-SZTE Research Group on Artificial Intelligence, 6720, Szeged, Tisza Lajos krt. 103., Hungary
Veronika Vincze & János Csirik
Linguistische Datenverarbeitung, Universität Trier, 54286, Trier, Universitätsring, Germany
Veronika Vincze
Department of Software Engineering, University of Szeged, 6720, Szeged, Árpád tér 2., Hungary
Zoltán Alexin

Authors

Márton Kiss
View author publications
You can also search for this author in PubMed Google Scholar
Ágoston Nagy
View author publications
You can also search for this author in PubMed Google Scholar
Veronika Vincze
View author publications
You can also search for this author in PubMed Google Scholar
Attila Almási
View author publications
You can also search for this author in PubMed Google Scholar
Zoltán Alexin
View author publications
You can also search for this author in PubMed Google Scholar
János Csirik
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Informatics, Department of Computer Graphics and Design, Masaryk University, Botanická 68a, 602 00, Brno, Czech Republic
Petr Sojka
Faculty of Informatics, Department of Information Technologies, Masaryk University, Botanická 68a, 602 00, Brno, Czech Republic
Aleš Horák , Ivan Kopeček & Karel Pala , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kiss, M., Nagy, Á., Vincze, V., Almási, A., Alexin, Z., Csirik, J. (2012). A Manually Annotated Corpus of Pharmaceutical Patents. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2012. Lecture Notes in Computer Science(), vol 7499. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32790-2_16

Download citation

DOI: https://doi.org/10.1007/978-3-642-32790-2_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32789-6
Online ISBN: 978-3-642-32790-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Manually Annotated Corpus of Pharmaceutical Patents

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Development of Text Mining Tools for Information Retrieval from Patents

Recognizing chemicals in patents: a comparative analysis

The Portability of Three Types of Text Mining Techniques into the Patent Text Genre

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Manually Annotated Corpus of Pharmaceutical Patents

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Development of Text Mining Tools for Information Retrieval from Patents

Recognizing chemicals in patents: a comparative analysis

The Portability of Three Types of Text Mining Techniques into the Patent Text Genre

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation