A Novel Knowledge-Based Architecture for Concept Mining on Italian and English Texts

Degl’Innocenti, Dante; De Nart, Dario; Tasso, Carlo

doi:10.1007/978-3-319-25840-9_9

A Novel Knowledge-Based Architecture for Concept Mining on Italian and English Texts

Dante Degl’Innocenti¹⁵,
Dario De Nart¹⁵ &
Carlo Tasso¹⁵

Conference paper
First Online: 28 October 2015

716 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 553))

Abstract

Manually annotating unstructured texts for finding significant concepts is a knowledge intensive process and, given the amount of data available on the Web and on digital libraries nowadays, it is not cost effective. Therefore automatic annotators capable to perform like human experts are extremely desirable. State of the art systems already offer good performance but they are often limited to one language, one domain of application, and can not entail concepts that do not appear but are logically/semantically implied in the text. In order to overcome this shortcomings, we propose here a novel knowledge-based, language independent, unsupervised approach towards keyphrase generation. We developed DIKpE-G, an experimental prototype system which integrates different kinds of knowledge, from linguistic to statistical, meta/structural, social, and ontological knowledge. DIKpE-G is capable to extract, evaluate, and infer meaningful concepts from a natural language text. The prototype performs well over both Italian and English texts.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
www.wikipedia.org.
2.
www.wordnik.com.
3.
snowball.tartarus.org.
4.
www.cis.uni-muenchen.de/~schmid/tools/TreeTagger.

References

Velardi, P., Navigli, R., Cucchiarelli, A., D’Antonio, F.: A new content-based model for social network analysis. In: ICSC, pp. 18–25. IEEE Computer Society (2008)
Google Scholar
W3Techs: Usage of content languages for websites (2014). http://w3techs.com/technologies
Zhang, C.: Automatic keyword extraction from documents using conditional random fields. J. Comput. Inf. Syst. 4, 1169–1180 (2008)
Google Scholar
Matsuo, Y., Ishizuka, M.: Keyword extraction from a single document using word co-occurrence statistical information. Int. J. Artif. Intell. Tools 13, 157–169 (2004)
Article Google Scholar
Barker, K., Cornacchia, N.: Using noun phrase heads to extract document keyphrases. In: Hamilton, H.J. (ed.) Canadian AI 2000. LNCS (LNAI), vol. 1822, pp. 40–52. Springer, Heidelberg (2000)
Chapter Google Scholar
Fagan, J.: Automatic phrase indexing for document retrieval. In: Proceedings of the 10th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1987, pp. 91–101. ACM, New York (1987)
Google Scholar
Krapivin, M., Marchese, M., Yadrantsau, A., Liang, Y.: Unsupervised key-phrases extraction from scientific papers using domain and linguistic knowledge. In: Third International Conference on Digital Information Management, ICDIM 2008, pp. 105–112 (2008)
Google Scholar
Frank, E., Paynter, G.W., Witten, I.H., Gutwin, C., et al.: Domain-specific keyphrase extraction. In: Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, pp. 668–673. Morgan Kaufmann Publishers (1999)
Google Scholar
Turney, P.D.: Learning algorithms for keyphrase extraction. Inf. Retrieval 2, 303–336 (2000)
Article Google Scholar
Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, EMNLP 2003, Stroudsburg, PA, USA, pp. 216–223. Association for Computational Linguistics (2003)
Google Scholar
Witten, I. H., Paynter, G. W., Frank, E., Gutwin, C., Nevill-Manning, C.G.: Kea: practical automatic keyphrase extraction. In: Proceedings of the fourth ACM conference on Digital libraries, pp. 254–255. ACM (1999)
Google Scholar
DAvanzo, E., Magnini, B., Vallin, A.: Keyphrase extraction for summarization purposes: the lake system at duc-2004. In: Proceedings of the 2004 Document Understanding Conference (2004)
Google Scholar
Liu, Z., Li, P., Zheng, Y., Sun, M.: Clustering to find exemplar terms for keyphrase extraction. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1. EMNLP 2009, Stroudsburg, PA, USA, pp. 257–266. Association for Computational Linguistics (2009)
Google Scholar
Danilevsky, M., Wang, C., Desai, N., Guo, J., Han, J.: Kert: Automatic extraction and ranking of topical keyphrases from content-representative document titles (2013). arXiv preprint arXiv:1306.0271
Litvak, M., Last, M., Friedman, M.: A new approach to improving multilingual summarization using a genetic algorithm. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 927–936. Association for Computational Linguistics (2010)
Google Scholar
Paukkeri, M.S., Nieminen, I.T., Pöllä, M., Honkela, T.: A language-independent approach to keyphrase extraction and evaluation. In: COLING (Posters), pp. 83–86 (2008)
Google Scholar
El-Beltagy, S.R., Rafea, A.: Kp-miner: a keyphrase extraction system for english and arabic documents. Inf. Syst. 34, 132–144 (2009)
Article Google Scholar
Ferragina, P., Scaiella, U.: Tagme: on-the-fly annotation of short text fragments (by wikipedia entities). In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM 2010, pp. 1625–1628. ACM, New York (2010)
Google Scholar
Pudota, N., Dattolo, A., Baruzzo, A., Ferrara, F., Tasso, C.: Automatic keyphrase extraction and ontology mining for content-based tag recommendation. Int. J. Intell. Syst. 25, 1158–1186 (2010)
Article MATH Google Scholar
De Nart, D., Tasso, C.: A domain independent double layered approach to keyphrase generation. In: WEBIST 2014 - Proceedings of the 10th International Conference on Web Information Systems and Technologies, pp. 305–312. SCITEPRESS Science and Technology Publications (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, Artificial Intelligence Lab, University of Udine, Udine, Italy
Dante Degl’Innocenti, Dario De Nart & Carlo Tasso

Authors

Dante Degl’Innocenti
View author publications
You can also search for this author in PubMed Google Scholar
Dario De Nart
View author publications
You can also search for this author in PubMed Google Scholar
Carlo Tasso
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dante Degl’Innocenti .

Editor information

Editors and Affiliations

Instituto de Telecomunicações, Lisboa, Portugal
Ana Fred
Delft University of Technology, Delft, Zuid-Holland, The Netherlands
Jan L. G. Dietz
University of Madeira, Funchal, Portugal
David Aveiro
Henley Business School, University of Reading, Reading, United Kingdom
Kecheng Liu
INSTICC, Setubal, Portugal
Joaquim Filipe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Degl’Innocenti, D., De Nart, D., Tasso, C. (2015). A Novel Knowledge-Based Architecture for Concept Mining on Italian and English Texts. In: Fred, A., Dietz, J., Aveiro, D., Liu, K., Filipe, J. (eds) Knowledge Discovery, Knowledge Engineering and Knowledge Management. IC3K 2014. Communications in Computer and Information Science, vol 553. Springer, Cham. https://doi.org/10.1007/978-3-319-25840-9_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-25840-9_9
Published: 28 October 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25839-3
Online ISBN: 978-3-319-25840-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics