Skip to main content

ACM: Article Content Miner for Assessing the Quality of Scientific Output

  • Conference paper
  • First Online:
Semantic Web Challenges (SemWebEval 2016)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 641))

Included in the following conference series:

Abstract

This paper presents the Article Content Miner (a.k.a. ACM), i.e., a method for processing the research papers in PDF format available for the 2016 edition of the Semantic Publishing Challenge in order to extract relevant semantic data and publish them in a RDF triplestore according to the Semantic Publishing And Referencing (SPAR) Ontologies (http://www.sparontologies.net). In particular, the extraction of all the information needed for addressing the queries of the second task of the challenge (https://github.com/ceurws/lod/wiki/SemPub16_Task2) is guaranteed by ACM by using techniques based on Natural Language Processing (i.e., Combinatory Categorial Grammar, Discourse Representation Theory, Linguistic Frames), Semantic Web technologies and good Ontology Design practices (i.e., Content Analysis, Ontology Design Patterns, Discourse Referent Extraction and Linking, Topic Extraction).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.sparontologies.net.

  2. 2.

    http://wit.istc.cnr.it/stlab-tools/fred.

  3. 3.

    https://github.com/ceurws/lod/wiki/SemPub16_Task2.

  4. 4.

    https://github.com/ceurws/lod/wiki/Task2.

  5. 5.

    http://2014.eswc-conferences.org/.

  6. 6.

    http://www.sparontologies.net.

  7. 7.

    http://purl.org/spar/fabio.

  8. 8.

    http://purl.org/spar/doco.

  9. 9.

    http://purl.org/spar/pro.

  10. 10.

    http://purl.org/cerif/frapo.

  11. 11.

    https://github.com/euske/pdfminer/.

  12. 12.

    http://wit.istc.cnr.it/stlab-tools/fred.

  13. 13.

    http://www.sparontologies.net.

  14. 14.

    https://github.com/ceurws/lod/wiki/SemPub2015.

  15. 15.

    https://github.com/ceurws/lod/wiki/SemPub2016.

  16. 16.

    https://github.com/ceurws/lod/wiki/SemPub16_Task2.

  17. 17.

    https://github.com/ceurws/lod/wiki/SemPub2016.

  18. 18.

    https://github.com/ceurws/lod/wiki/SemPub16_Task2.

  19. 19.

    https://github.com/ceurws/lod/wiki/SemPub2016.

References

  1. Agirre, E., Soroa, A.: Personalizing pagerank for word sense disambiguation. In: EACL 2009, Athens, Greece. The Association for Computer Linguistics (2009)

    Google Scholar 

  2. Bertin, M., Atanassova, I.: Hybrid approach for the semantic processing of scientific papers. In: Semantic Publishing Challenge (2014)

    Google Scholar 

  3. Bos, J.: Wide-coverage semantic analysis with boxer. In: Bos, J., Delmonte, R. (eds.) Semantics in Text Processing, pp. 277–286. College Publications, London (2008)

    Google Scholar 

  4. Constantin, A., Peroni, S., Pettifer, S., Shotton, D., Vitali, F.: The document components ontology (DoCO). In: Semantic Web - Interoperability, Usability, Applicability. IOS Press, Amsterdam (2016). doi:10.3233/SW-150177

    Google Scholar 

  5. Constantin, A., Steve, P., Andrei, V.: Fully-automated PDF-to-XML conversion of scientific literature. In: Proceedings of the ACM Symposium on Document Engineering, pp. 177–180. ACM, New York (2013). doi:10.1145/2494266.2494271

  6. d’Aquin, M., Baldassare, C., Gridinoc, L., Sabou, M., Angeletou, S., Motta, E.: Supporting next generation semantic web applications. In: Proceedings of WWW/Internet Conference (2007)

    Google Scholar 

  7. Das, S., Sundara, S., Cyganiak, R.: R2RML: RDB to RDF Mapping Language. W3C recommendation (2012). http://www.w3.org/TR/r2rml/

  8. Di Iorio, A., Nuzzolese, A.G., Peroni, S.: Towards the automatic identification of the nature of citations. In: Castro, A.G., Lange, C., Lord, P.W., Stevens, R. (eds.) SePublica, CEUR Workshop Proceedings, vol. 994, pp. 63–74 (2013). CEUR-WS.org

  9. Di Iorio, A., Nuzzolese, A.G., Peroni, S., Shotton, D., Vitali, F.: Describing bibliographic references in RDF. In: Garcia Castro, A., Lange, C., Lord, P., Stevens, R. (eds.) Proceedings of 4th Workshop on Semantic Publishing (SePublica 2014) (2014). http://ceur-ws.org/Vol-1155/paper-05.pdf

  10. Di Iorio, A., Peroni, S., Poggi, F., Vitali, F.: Dealing with structural patterns of XML documents. J. Am. Soc. Inf. Sci. Technol. 65(9), 1884–1900 (2014). doi:10.1002/asi.23088. Wiley, Hoboken

    Article  Google Scholar 

  11. Di Iorio, A., Peroni, S., Poggi, F., Vitali, F., Shotton, D.: Recognising document components in XML-based academic articles. In: Proceedings of the 2013 ACM symposium on Document Engineering (DocEng 2013), pp. 181–184. ACM, New York (2013). doi:10.1145/2494266.2494319

  12. Dimou, A., Vander Sande, M., Colpaert, P., De Vocht, L., Verborgh, R., Mannens, E., Van de Walle, R.: Extraction and semantic annotation of workshop proceedings in HTML using RML. In: Semantic Publishing Challenge (2014)

    Google Scholar 

  13. Dimou, A., Vander Sande, M., Colpaert, P., Mannens, E., Van de Walle, R.: RML: a generic language for integrated RDF Mappings of Heterogeneous Data. In: Workshop on Linked Data on the Web (2014)

    Google Scholar 

  14. Flanigan, J., Dyer, C., Smith, A.N., Carbonell, J.: Generation from Abstract Meaning Representation using Tree Transducers (Accepted to NAACL HTL, 2016)

    Google Scholar 

  15. Gangemi, A.: A comparison of knowledge extraction tools for the semantic web. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) ESWC 2013. LNCS, vol. 7882, pp. 351–366. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  16. Gangemi, A., Draicchio, F., Presutti, V., Nuzzolese, A.G., Reforgiato Recupero, D.: A machine reader for the semantic web. In: Blomqvist, E., Groza, T. (eds.) International Semantic Web Conference (Posters & Demos), CEUR Workshop Proceedings, vol. 1035, pp. 149–152 (2013). CEUR-WS.org

  17. Gangemi, A., Nuzzolese, A.G., Presutti, V., Draicchio, F., Musetti, A., Ciancarini, P.: Automatic typing of DBpedia entities. In: Cudré-Mauroux, P., et al. (eds.) ISWC 2012, Part I. LNCS, vol. 7649, pp. 65–81. Springer, Heidelberg (2012)

    Google Scholar 

  18. Gangemi, A., Presutti, V., Reforgiato Recupero, D.: Frame-based detection of opinion holders and topics: a model and a tool. IEEE Comp. Int. Mag. 9(1), 20–30 (2014)

    Article  Google Scholar 

  19. Garcia, A., Murray-Rust, P., Burns, G.A., Stevens, R., Tkaczyk, D., McLaughlin, C., Belin, A., Iorio, A., García, L., Gruson-Daniel, C., Mounce, R., Nuzzolese, A.G., Peroni, S., Spinks, J., Villazon-Terrazas, B., Corcho, O., Giraldo, O., Wabiszewski, M.: PDFJailbreak-a communal architecture for making biomedical PDFs semantic. In: Proceedings of BioLINK SIG (2013)

    Google Scholar 

  20. Kamp, H.: A theory of truth and semantic representation. In: Groenendijk, J.A.G., Janssen, T.M.V., Stokhof, M.B.J. (eds.) Formal Methods in the Study of Language, vol. 1, pp. 277–322. Mathematisch Centrum, Amsterdam (1981)

    Google Scholar 

  21. Lafferty, J., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of 18th International Conference on Machine Learning, pp. 282–289. Morgan Kaufmann, San Francisco (2001)

    Google Scholar 

  22. Lange, C., Di Iorio, A.: Semantic publishing challenge – assessing the quality of scientific output. In: Presutti, V., et al. (eds.) SemWebEval 2014. CCIS, vol. 475, pp. 61–76. Springer International Publishing, Heidelberg (2014)

    Google Scholar 

  23. Luong, M.T., Dung Nguyen, T., Kan, M.Y.: Logical structure recovery in scholarly articles with rich document features. Int. J. Digit. Libr. Syst. (IJDLS) 1(4), 1–23 (2010)

    Article  Google Scholar 

  24. Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014)

    Google Scholar 

  25. Moro, A., Raganato, A., Navigli, R.: Entity linking meets word sense disambiguation: a unified approach. Trans. Assoc. Comput. Linguist. 2, 231–244 (2014)

    Google Scholar 

  26. Navigli, R., Ponzetto, S.P.: BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif. Intell. 193, 217–250 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  27. NLM. http://dtd.nlm.nih.gov/archiving/

  28. Nuzzolese, A., Peroni, S., Reforgiato Recupero, D.: MACJa: Metadata and Citations Jailbreaker (2015). doi:10.1007/978-3-319-25518-7

    Google Scholar 

  29. PDFMiner: Python PDF parser and analyzer (2010)

    Google Scholar 

  30. Peroni, S.: Semantic Web Technologies and Legal Scholarly Publishing (2014). ISBN 978-3-319-04776-8

    Google Scholar 

  31. Peroni, S.: Example of use of FRAPO #1. figshare (2015). http://dx.doi.org/10.6084/m9.figshare.1549721

  32. Peroni, S., Shotton, D.: FaBiO and CiTO: ontologies for describing bibliographic resources and citations. Web Seman. Sci. Serv. Agents World Wide Web 17, 33–43 (2012). doi:10.1016/j.websem.2012.08.001

    Article  Google Scholar 

  33. Peroni, S., Shotton, D., Vitali, F.: Scholarly publishing and linked data: describing roles, statuses, temporal and contextual extents. In: Sack, H., Pellegrini, T. (eds.) Proceedings of the 8th International Conference on Semantic Systems (i-Semantics 2012), pp. 9–16. ACM Press, New York. doi:10.1145/2362499.2362502

  34. Presutti, V., Consoli, S., Nuzzolese, A.G., Reforgiato Recupero, D., Gangemi, A., Bannour, I., Zargayouna, H.: Uncovering the semantics of wikipedia wikilinks. In: 19th International Conference on Knowledge Engineering and Knowledge Management (EKAW 2014) (2014)

    Google Scholar 

  35. Presutti, V., Draicchio, F., Gangemi, A.: Knowledge extraction based on discourse representation theory and linguistic frames. In: ten Teije, A., Volker, J., Handschuh, S., Stuckenschmidt, H., d’Acquin, M., Nikolov, A., Aussenac-Gilles, N., Hernandez, N. (eds.) EKAW 2012. LNCS, vol. 7603, pp. 114–129. Springer, Berlin (2012)

    Chapter  Google Scholar 

  36. Reforgiato Recupero, D., Consoli, S., Gangemi, A., Nuzzolese, A.G., Spampinato, D.: A semantic web based core engine to efficiently perform sentiment analysis. In: Presutti, V., Blomqvist, E., Troncy, R., Sack, H., Papadakis, I., Tordai, A. (eds.) ESWC Satellite Events 2014. LNCS, vol. 8798, pp. 245–248. Springer, Heidelberg (2014)

    Google Scholar 

  37. Reforgiato Recupero, D., Presutti, V., Consoli, S., Gangemi, A., Nuzzolese, A.G.: Sentilo: frame-based sentiment analysis. Cogn. Comput. 7, 211–225 (2014)

    Article  Google Scholar 

  38. Shotton, D.: Semantic publishing: the coming revolution in scientific journal publishing. Learn. Publ. 22(2), 85–94 (2009)

    Article  Google Scholar 

  39. Tkaczyk, D., Szostek, P., Jan Dendek, P., Fedoryszak, M., Bolikowski, L.: CERMINE - automatic extraction of metadata and references from scientific literature. In: Proceedings of the 11th IAPR International Workshop on Document Analysis Systems, pp. 217–221 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Diego Reforgiato Recupero .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Nuzzolese, A.G., Peroni, S., Reforgiato Recupero, D. (2016). ACM: Article Content Miner for Assessing the Quality of Scientific Output. In: Sack, H., Dietze, S., Tordai, A., Lange, C. (eds) Semantic Web Challenges. SemWebEval 2016. Communications in Computer and Information Science, vol 641. Springer, Cham. https://doi.org/10.1007/978-3-319-46565-4_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46565-4_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46564-7

  • Online ISBN: 978-3-319-46565-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics