ABSTRACT
Extracting faceted taxonomies from the Web has received increasing attention in recent years from the web mining community. We demonstrate in this study a novel system called DFT-Extractor, which automatically constructs domain-specific faceted taxonomies from Wikipedia in three steps: 1) It crawls domain terms from Wikipedia by using a modified topical crawler. 2) Then it exploits a classification model to extract hyponym relations with the use of motif-based features. 3) Finally, it constructs a faceted taxonomy by applying a community detection algorithm and a group of heuristic rules. DFT-Extractor also provides a graphical user interface to visualize the learned hyponym relations and the tree structure of taxonomies.
- Y. Tzitzikas, N. Spyratos, P. Constantopoulos, and A. Analyti. Extended faceted taxonomies for web catalogs. In Proc. of WISE-02, pages 192--204, 2002. Google ScholarDigital Library
- B. Wei, J. Liu, Q. Zheng, W. Zhang, X. Fu, and B. Feng. A survey of faceted search. Journal of Web engineering, vol. 12, pages 041--064, 2013. Google ScholarDigital Library
- W. Dakka and P. G. Ipeirotis. Automatic extraction of useful facet hierarchies from text databases. In Proc. of ICDM-08, pages 466--475, 2008. Google ScholarDigital Library
- E. Stoica, M. A. Hearst, and M. Richardson. Automating creation of hierarchical faceted metadata structures. In Proc. of HLT-NAACL-07, pages 244--251, 2007.Google Scholar
- W. Wu, H. Li, H. Wang, and K. Q. Zhu. Probase: a probabilistic taxonomy for text understanding. In Proc. of SIGMOD-12, pages 481--492, 2012. Google ScholarDigital Library
- G. d. Melo and G. Weikum. MENTA: inducing multilingual taxonomies from Wikipedia. In Proc. of CIKM-10, pages 1099--1108, 2010. Google ScholarDigital Library
- R. Navigli and S. P. Ponzetto. BabelNet: building a very large multilingual semantic network. In Proc. of ACL-10, pages 216--225, 2010. Google ScholarDigital Library
- V. Nastase, M. Strube, B. Boerschinger, C. Zirn, and A. Elghafari. WikiNet: a very large scale multi-lingual concept network. In Proc. of LREC-10, pages 1015--1022, 2010.Google Scholar
- B. Wei, J. Liu, J. Ma, Q. Zheng, W. Zhang, and B. Feng. MOTIF-RE: motif-based hypernym/hyponym relation extraction from wikipedia links. In Proc. of ICONIP-12, pages 610--619, 2012. Google ScholarDigital Library
- R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. Network motifs: simple building blocks of complex networks. Science, vol. 298, pages 824--827, 2002.Google Scholar
- V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, vol. 2008, pages P10008, 2008.Google ScholarCross Ref
- J. Madadhain, D. Fisher, P. Smyth, S. White, and Y. B. Boey. Analysis and visualization of network data using JUNG. Journal of Statistical Software, vol. 10, pages 1--35, 2005.Google Scholar
- J. Heer, S. K. Card, and J. A. Landay. Prefuse: a toolkit for interactive information visualization. In Proc. of SIGCHI on Human Factors in Computing Systems, pages 421--430, 2005. Google ScholarDigital Library
Index Terms
- DFT-extractor: a system to extract domain-specific faceted taxonomies from wikipedia
Recommendations
A survey of faceted search
Faceted Search is an exploratory search mechanism, which provides an iterative way to refine search results by a faceted taxonomy. With the benefit of search results diversification, no need for a priori knowledge, and never leading to zero result, it ...
Constructing faceted taxonomy for heterogeneous entities based on object properties in linked data
The interlinking of data across the web, a concept known as Linked Data, fosters opportunities in data sharing and reusability. However, it may also pose some challenges, which includes the absence of concept taxonomies by which to organize ...
Entity ranking using Wikipedia as a pivot
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge managementIn this paper we investigate the task of Entity Ranking on the Web. Searchers looking for entities are arguably better served by presenting a ranked list of entities directly, rather than a list of web pages with relevant but also potentially redundant ...
Comments