Abstract
This work applies grammatical evolution to identify taxonomic hierarchies of concepts from Wikipedia. Each article in Wikipedia covers a topic and is cross-linked by hyperlinks that connect related topics. Hierarchical taxonomies and their generalization to ontologies are a highly useful resource for many applications since they enable semantic search and reasoning. Thus, the automatic identification of taxonomies composed of concepts associated with linked Wikipedia pages has attracted much attention. We have developed a system which arranges a set of Wikipedia concepts into a taxonomy. This technique is based on the relationships among a set of features extracted from the contents of the Wikipedia pages. We have used a grammatical evolution algorithm to discover the best way of combining the considered features in an explicit function. Candidate functions are evaluated by applying a genetic algorithm to approximate the optimal taxonomy that the function can provide for a number of training cases. The fitness is computed as an average of the precision obtained by comparing, for the set of training cases, the taxonomy provided by the evaluated function with the reference one. Experimental results show that the proposal is able to provide valuable functions to find high-quality taxonomies.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
They are available at http://nlp.uned.es/~lurdes/wikipedia_data.
References
Ali E, Raghavan V (2015) Extending skos: A wikipedia-based unified annotation model for creating interoperable domain ontologies. In: Esposito F, Pivert O, Hacid MS, Rás ZW, Ferilli S (eds) Proceedings of the 22nd international symposium on foundations of intelligent systems. Springer, pp 364–370
Araujo L, Martinez-Romo J, Duque A (2015) Grammatical evolution for identifying wikipedia taxonomies. In: Genetic and evolutionary computation conference, GECCO 2015, Madrid, Spain, July 11–15, 2015, companion material proceedings, pp 1345–1346
Bartoli A, De Lorenzo A, Medvet E, Tarlao F (2016) Syntactical similarity learning by means of grammatical evolution. In: Handl J, Hart E, Lewis PR, López-Ibáñez M, Ochoa G, Paechter B (eds) Proceedings of parallel problem solving from nature—PPSN XIV. Springer, pp 260–269
Ben Aouicha M, Hadj Taieb MA, Ezzeddine M (2016) Derivation of “is” taxonomy from wikipedia category graph. Eng Appl Artif Intell 50(C):265–286. doi:10.1016/j.engappai.2016.01.033
Bhogal J, Macfarlane A, Smith P (2007) A review of ontology based query expansion. Inf Process Manag 43(4):866–886
Camous F, Blott S, Smeaton A (2007) Ontology-based medline document classification. In: Hochreiter S, Wagner R (eds) Bioinformatics research and development. Lecture notes in computer science, vol 4414. Springer, Berlin, pp 439–452. doi:10.1007/978-3-540-71233-6_34
Cerri R, Barros RC, Freitas AA, de Carvalho AC (2014) Evolving relational hierarchical classification rules for predicting gene ontology-based protein functions. In: Proceedings of the 2014 conference companion on genetic and evolutionary computation companion, GECCO Comp ’14. ACM, New York, pp 1279–1286
Chernov S, Iofciu T, Nejdl W, Zhou X (2006) Extracting semantics relationships between wikipedia categories. In: Völkel M, Schaffert S (eds) Proceedings of the first workshop on semantic wikis-from wiki to semantics, ESWC2006. Workshop on semantic wikis
Clarke LE (1958) On Cayley’s formula for counting trees. J Lond Math Soci 33(4):471–474
Dempsey I, O’Neill M, Brabazon A (2007) Constant creation in grammatical evolution. Int J Innov Comput Appl 1(1):23–38
Forsati R, Shamsfard M (2016) Symbiosis of evolutionary and combinatorial ontology mapping approaches. Inf Sci 342(C):53–80
Galitsky BA (2013) Transfer learning of syntactic structures for building taxonomies for search engines. Eng Appl Artif Intell 26(10):2504–2515
Geem ZW, Kim JH, Loganathan G (2001) A new heuristic optimization algorithm: harmony search. Simulation 76(2):60–68
He P, Deng Z, Gao C, Wang X, Li J (2016) Model approach to grammatical evolution: deep-structured analyzing of model and representation. Soft Comput 1–11. doi:10.1007/s00500-016-2130-1
Herbelot A, Copestake A (2006) Acquiring ontological relationships from wikipedia using rmrs. In: Proceedings of the ISWC 2006 workshop on web content mining with human language technologies
Hovy E (1998) Combining and standardizing large-scale, practical ontologies for machine translation and other uses. In: Language resource and evaluation conference. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.66.8225
Isele R, Bizer C (2013) Active learning of expressive linkage rules using genetic programming. Web Semant Sci Serv Agents World Wide Web 23(0):2–15
Khalatbari S, Mirroshandel SA (2015) Automatic construction of domain ontology using wikipedia and enhancing it by google search engine. J Inf Syst Telecommun 3:248–258
Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press, cambridge
Lehmann J, Isele R, Jakob M, Jentzsch A, Kontokostas D, Mendes PN, Hellmann S, Morsey M, van Kleef P, Auer S, Bizer C (2015) DBpedia–a large-scale, multilingual knowledge base extracted from wikipedia. Semant Web J 6(2):167–195
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, New York
Mao Y (2001) A semantic-based genetic algorithm for sub-ontology evolution. Inf Technol J 9(4):609–620
Medelyan O, Milne D, Legg C, Witten IH (2009) Mining meaning from wikipedia. Int J Hum Comput Stud 67(9):716–754
Miles A, Bechhofer S (2008) SKOS simple knowledge organization system reference. Working draft, W3C. http://www.w3.org/TR/skos-reference
Morales LP, Esteban AD, Gervás P (2008) Concept-graph based biomedical automatic summarization using ontologies. In: Proceedings of the 3rd textgraphs workshop on graph-based algorithms for natural language processing. Association for Computational Linguistics, Stroudsburg, pp 53–56
Nakayama K, Hara T, Nishio S (2007) A thesaurus construction method from large scale web dictionaries. In: Proceedings of the 21st IEEE international conference on advanced information networking and applications, AINA07. IEEE Computer Society, pp 932–939
Navigli R, Velardi P, Gangemi A (2003) Ontology learning and its application to automated terminology translation. Intell Syst IEEE 18(1):22–31
Nguyen DPT, Matsuo Y, Ishizuka M (2007) Exploiting syntactic and semantic information for relation extraction from Wikipedia. In: IJCAI workshop on text-mining and link-analysis (TextLink 2007)
O’Neill M, Ryan C (2001) Grammatical evolution. IEEE Trans Evol Comput 5(4):349–358
Otero FEB, Freitas AA, Johnson CG (2009) A hierarchical classification ant colony algorithm for predicting gene ontology terms. In: Pizzuti C, Ritchie MD, Giacobini M (eds) EvoBIO. Lecture notes in computer science, vol 5483. Springer, pp 68–79
Othman RM, Deris S, Illias RM, Alashwal HT, Hassan R, Farhan M (2007) Incorporating semantic similarity measure in genetic algorithm: an approach for searching the gene ontology terms. Int J Comput Intell 1(12):325–334
Ponzetto SP, Strube M (2007) Deriving a large scale taxonomy from wikipedia. In: AAAI’07, Proceedings of the 22nd national conference on artificial intelligence, vol 2. AAAI Press, pp 1440–1445
Prokofyev R, Demartini G, Boyarsky A, Ruchayskiy O, Cudr-Mauroux P (2013) Ontology-based word sense disambiguation for scientific literature. In: Serdyukov P, Braslavski P, Kuznetsov S, Kamps J, Rger S, Agichtein E, Segalovich I, Yilmaz E (eds) Advances in information retrieval. Lecture notes in computer science, vol 7814. Springer, Berlin, pp 594–605
Ruiz-Casado M, Alfonseca E, Castells P (2005) Automatic extraction of semantic relationships for wordnet by means of pattern learning from wikipedia. In: NLDB, pp 67–79
Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620
Schlegel DR, Crowner C, Elkin PL (2015) Automatically expanding the synonym set of SNOMED CT using wikipedia. In: MEDINFO 2015: eHealth-enabled Health—Proceedings of the 15th world congress on health and biomedical informatics, São Paulo, Brazil, 19–23 August 2015, pp 619–623
Suchanek FM, Ifrim G, Weikum G (2006) Combining linguistic and statistical analysis to extract relations from web documents. In: KDD ’06, Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 712–717
Suchanek FM, Kasneci G, Weikum G (2007) Yago: A core of semantic knowledge. In: WWW ’07, Proceedings of the 16th international conference on world wide web. ACM, New York, pp 697–706
Vicient C, Sánchez D, Moreno A (2013) An automatic approach for ontology-based feature extraction from heterogeneous textualresources. Eng Appl Artif Intell 26(3):1092–1106
Weber N, Buitelaar P (2006) Web-based ontology learning with isolde. In: Proceedings of the workshop on web content mining with human language at the international semantic web conference
Wu F, Weld DS (2007) Autonomously semantifying wikipedia. In: CIKM ’07, Proceedings of the sixteenth ACM conference on conference on information and knowledge management. ACM, New York, USA, pp 41–50
Acknowledgements
This work has been partially supported by the Spanish Ministry of Science and Innovation within the projects EXTRECM (TIN2013-46616-C2-2-R) and PROSA-MED (TIN2016-77820-C3-2-R), as well as by the Universidad Nacional de Educación a Distancia (UNED) through the FPI-UNED 2013 Grant. The authors would like to thank the referees for their valuable comments which led to improvements in the paper.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Lourdes Araujo declares that she has no conflict of interest. Juan Martinez-Romo declares that he has no conflict of interest. Andres Duque declares that he has no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Communicated by V. Loia.
Rights and permissions
About this article
Cite this article
Araujo, L., Martinez-Romo, J. & Duque, A. Discovering taxonomies in Wikipedia by means of grammatical evolution. Soft Comput 22, 2907–2919 (2018). https://doi.org/10.1007/s00500-017-2544-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-017-2544-4