Exploiting Wikipedia-Based Information-Rich Taxonomy for Extracting Location, Creator and Membership Related Information for ConceptNet Expansion

Krawczyk, Marek; Rzepka, Rafal; Araki, Kenji

doi:10.1007/978-3-319-93782-3_19

Marek Krawczyk¹⁶,
Rafal Rzepka¹⁷ &
Kenji Araki¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10930))

Included in the following conference series:

Language and Technology Conference

549 Accesses

Abstract

In this paper we present a method for extracting IsA assertions (hyponymy relations), AtLocation assertions (informing of the location of an object or place), LocatedNear assertions (informing of neighboring locations), CreatedBy assertions (informing of the creator of an object) and MemberOf assertions (informing of group membership) automatically from Japanese Wikipedia XML dump files. We use the Hyponymy extraction tool v1.0, which analyses definition, category and hierarchy structures of Wikipedia articles to extract IsA assertions and produce information-rich taxonomy. From this taxonomy we extract additional information, in this case AtLocation, LocatedNear, CreatedBy and MemberOf types of assertions, using our original method. The presented experiments prove that both methods produce satisfactory results: we were able to acquire 5,866,680 IsA assertions with 96.0% reliability, 131,760 AtLocation assertion pairs with 93.5% reliability, 6,217 LocatedNear assertion pairs with 98.5% reliability, 270,230 CreatedBy assertion pairs with 78.5% reliability and 21,053 MemberOf assertions with 87.0% reliability. Our method surpassed the baseline system in terms of both precision and the number of acquired assertions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

The Study of Indian Domain Ontology Building Based on the Framework of HNC

A Combined Approach for Ontology Enrichment from Textual and Open Data

Discovery and Enrichment of Knowledges from a Semantic Wiki

Notes

1.
http://www.wiktionary.org/.
2.
http://www.wikipedia.org/.
3.
http://nadya.jp/.
4.
http://alaginrc.nict.go.jp/hyponymy/.
5.
http://www.tkl.iis.u-tokyo.ac.jp/~ynaga/pecco/.
6.
https://github.com/commonsense/conceptnet5/wiki/Relations.
7.
Curly brackets were used to mark the tags’ representations.
8.
To measure the agreement level between judges, we used Randolph’s free marginal multirater kappa instead of Fleiss’ fixed-marginal multirater kappa, due to high agreement low kappa paradox.
9.
We adjusted the number of evaluated pairs to balance the proportion between the total number of pairs and the test sample.

References

Lenat, D.B.: CYC: a large-scale investment in knowledge infrastructure. Commun. ACM 38(11), 33–38 (1995)
Article Google Scholar
Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web, pp. 697–706 (2007)
Google Scholar
Liu, H., Singh, P.: ConceptNet? A practical commonsense reasoning tool-kit. BT Technol. J. 22(4), 211–226 (2004)
Article Google Scholar
Singh, P., Lin, T., Mueller, E.T., Lim, G., Perkins, T., Zhu, W.L.: Open mind common sense: knowledge acquisition from the general public. In: On the Move to Meaningful Internet Systems 2002: CoopIS, DOA, and ODBASE, pp. 1223–1237 (2002)
Chapter Google Scholar
Speer, R.H., Havasi, C., Treadway, K.N., Lieberman, H.: Finding your way in a multi-dimensional semantic space with Luminoso. In: Proceedings of the 15th International Conference on Intelligent User Interfaces, pp. 385–388 (2010)
Google Scholar
Cambria, E., Hussain, A., Havasi, C., Eckl, C.: SenticSpace: visualizing opinions and sentiments in a multi-dimensional vector space. In: Knowledge-Based and Intelligent Information and Engineering Systems, pp. 385–393 (2010)
Chapter Google Scholar
Korner, S.J., Brumm, T.: RESI - a natural language specification improver. In: IEEE International Conference on Semantic Computing, pp. 1–8 (2009)
Google Scholar
Nakahara, K., Yamada, S.: Development and evaluation of a web-based game for common-sense knowledge acquisition in Japan. Unisys Technol. Rev. 107, 295–305 (2011)
Google Scholar
Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka Jr., E.R., Mitchell, T.M.: Toward an architecture for never-ending language learning. In: AAAI, vol. 5, p. 3 (2010)
Google Scholar
Schubert, L.: Can we derive general world knowledge from texts? In: Proceedings of the Second International Conference on Human Language Technology Research, pp. 94–97 (2002)
Google Scholar
Mendes, P.N., Jakob, M., García-Silva, A., Bizer, C.: DBpedia spotlight: shedding light on the web of documents. In: Proceedings of the 7th International Conference on Semantic Systems, pp. 1–8 (2011)
Google Scholar
Krawczyk, M., Rzepka, R., Araki, K.: Extracting ConceptNet knowledge triplets from Japanese Wikipedia. In: Proceedings of the 21st Annual Meeting of the Association for Natural Language Processing, pp. 1052–1055 (2015)
Google Scholar
Sumida, A., Torisawa, K.: Hacking Wikipedia for hyponymy relation acquisition. In: IJCNLP, vol. 8, pp. 883–888 (2008)
Google Scholar
Sumida, A., Yoshinaga, N., Torisawa, K.: Boosting precision and recall of hyponymy relation acquisition from hierarchical layouts in Wikipedia. In: LREC (2008)
Google Scholar
Yamada, I., Hashimoto, C., Oh, J., Torisawa, K., Kuroda, K., De Saeger, S., Tsuchida, M., Kazama, J.: Generating information-rich taxonomy from Wikipedia. In: 4th International Universal Communication Symposium (IUCS), pp. 97–104 (2010)
Google Scholar
Randolph, J.J.: Free-Marginal Multirater Kappa (multirater K [free]): An Alternative to Fleiss’ Fixed-Marginal Multirater Kappa (2005). Online Submission
Google Scholar

Download references

Author information

Authors and Affiliations

Future Processing, ul. Bojkowska 37A, 44-100, Gliwice, Poland
Marek Krawczyk
Hokkaido University, Kita-ku, Kita 14, Nishi 9, Sapporo, Japan
Rafal Rzepka & Kenji Araki

Authors

Marek Krawczyk
View author publications
You can also search for this author in PubMed Google Scholar
Rafal Rzepka
View author publications
You can also search for this author in PubMed Google Scholar
Kenji Araki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rafal Rzepka .

Editor information

Editors and Affiliations

Adam Mickiewicz University, Poznań, Poland
Zygmunt Vetulani
LIMSI-CNRS, Orsay Cedex, France
Joseph Mariani
Adam Mickiewicz University, Poznań, Poland
Marek Kubis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Krawczyk, M., Rzepka, R., Araki, K. (2018). Exploiting Wikipedia-Based Information-Rich Taxonomy for Extracting Location, Creator and Membership Related Information for ConceptNet Expansion. In: Vetulani, Z., Mariani, J., Kubis, M. (eds) Human Language Technology. Challenges for Computer Science and Linguistics. LTC 2015. Lecture Notes in Computer Science(), vol 10930. Springer, Cham. https://doi.org/10.1007/978-3-319-93782-3_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-93782-3_19
Published: 16 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93781-6
Online ISBN: 978-3-319-93782-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics