Abstract
We propose a hybrid method for the extraction and characterization of citations in scientific papers using machine learning combined with rule-based approaches. Our protocol consists of the extraction of metadata, bibliography parsing, section titles processing, and find-grained semantic annotation on the sentence level of texts. This allows us to generate Linked Open Data from a set of research papers in XML.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
PubMed ID Converter API: http://www.ncbi.nlm.nih.gov/pmc/tools/id-converter- api/
- 2.
The D2RQ Platform is a system for accessing relational databases as RDF graphs: http://d2rq.org/
- 3.
References
Bertin, M., Atanassova, I., Lariviere, V., Gingras, Y.: The distribution of references in scientific papers: an analysis of the IMRaD structure. In: Proceedings of the 14th ISSI Conference, pp. 591–603 (2013)
Councill, I.G., Giles, C.L., Kan, M.Y.: ParsCit: an open-source CRF reference string parsing package. In: LREC (2008)
Do, H.H.N., Chandrasekaran, M.K., Cho, P.S., Kan, M.Y.: Extracting and matching authors and affiliations in scholarly documents. In: Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 219–228. ACM (2013)
Shotton, D.: Cito, the citation typing ontology. J. Biomed. Semant. 1(Suppl 1), S6 (2010)
Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp. 363–370 (2005)
Desclés, J.P.: Contextual exploration processing for discourse and automatic annotations of texts. In: FLAIRS Conference, pp. 281–284 (2006)
Bertin, M., Atanassova, I., Descles, J.P.: Automatic analysis of author judgment in scientific articles based on semantic annotation. In: 22nd International Florida Artificial Intelligence, Research Society Conference, Sanibel Island, Florida. AAAI Press (2009)
Bizer, C., Seaborne, A.: D2RQ-treating non-RDF databases as virtual RDF graphs. In: Proceedings of the 3rd International Semantic Web Conference (ISWC 2004), vol. 2004 (2004)
Cyganiak, R., Bizer, C.: D2R server: a semantic web front-end to existing relational databases. In: XML Tage, 2006, pp. 171–173 (2006)
Shotton, D., Peroni, S.: DoCo, the document components ontology (2011)
Acknowledgments
We thank Angelo Di Iorio at the Department of Computer Science and Engineering (DISI) of the University of Bologna for providing the gold standard and the evaluation.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Bertin, M., Atanassova, I. (2014). Extraction and Characterization of Citations in Scientific Papers. In: Presutti, V., et al. Semantic Web Evaluation Challenge. SemWebEval 2014. Communications in Computer and Information Science, vol 475. Springer, Cham. https://doi.org/10.1007/978-3-319-12024-9_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-12024-9_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12023-2
Online ISBN: 978-3-319-12024-9
eBook Packages: Computer ScienceComputer Science (R0)