skip to main content
10.1145/2487788.2487922acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
demonstration

DFT-extractor: a system to extract domain-specific faceted taxonomies from wikipedia

Authors Info & Claims
Published:13 May 2013Publication History

ABSTRACT

Extracting faceted taxonomies from the Web has received increasing attention in recent years from the web mining community. We demonstrate in this study a novel system called DFT-Extractor, which automatically constructs domain-specific faceted taxonomies from Wikipedia in three steps: 1) It crawls domain terms from Wikipedia by using a modified topical crawler. 2) Then it exploits a classification model to extract hyponym relations with the use of motif-based features. 3) Finally, it constructs a faceted taxonomy by applying a community detection algorithm and a group of heuristic rules. DFT-Extractor also provides a graphical user interface to visualize the learned hyponym relations and the tree structure of taxonomies.

References

  1. Y. Tzitzikas, N. Spyratos, P. Constantopoulos, and A. Analyti. Extended faceted taxonomies for web catalogs. In Proc. of WISE-02, pages 192--204, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. B. Wei, J. Liu, Q. Zheng, W. Zhang, X. Fu, and B. Feng. A survey of faceted search. Journal of Web engineering, vol. 12, pages 041--064, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. W. Dakka and P. G. Ipeirotis. Automatic extraction of useful facet hierarchies from text databases. In Proc. of ICDM-08, pages 466--475, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. E. Stoica, M. A. Hearst, and M. Richardson. Automating creation of hierarchical faceted metadata structures. In Proc. of HLT-NAACL-07, pages 244--251, 2007.Google ScholarGoogle Scholar
  5. W. Wu, H. Li, H. Wang, and K. Q. Zhu. Probase: a probabilistic taxonomy for text understanding. In Proc. of SIGMOD-12, pages 481--492, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. G. d. Melo and G. Weikum. MENTA: inducing multilingual taxonomies from Wikipedia. In Proc. of CIKM-10, pages 1099--1108, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. Navigli and S. P. Ponzetto. BabelNet: building a very large multilingual semantic network. In Proc. of ACL-10, pages 216--225, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. V. Nastase, M. Strube, B. Boerschinger, C. Zirn, and A. Elghafari. WikiNet: a very large scale multi-lingual concept network. In Proc. of LREC-10, pages 1015--1022, 2010.Google ScholarGoogle Scholar
  9. B. Wei, J. Liu, J. Ma, Q. Zheng, W. Zhang, and B. Feng. MOTIF-RE: motif-based hypernym/hyponym relation extraction from wikipedia links. In Proc. of ICONIP-12, pages 610--619, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. Network motifs: simple building blocks of complex networks. Science, vol. 298, pages 824--827, 2002.Google ScholarGoogle Scholar
  11. V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, vol. 2008, pages P10008, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  12. J. Madadhain, D. Fisher, P. Smyth, S. White, and Y. B. Boey. Analysis and visualization of network data using JUNG. Journal of Statistical Software, vol. 10, pages 1--35, 2005.Google ScholarGoogle Scholar
  13. J. Heer, S. K. Card, and J. A. Landay. Prefuse: a toolkit for interactive information visualization. In Proc. of SIGCHI on Human Factors in Computing Systems, pages 421--430, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. DFT-extractor: a system to extract domain-specific faceted taxonomies from wikipedia

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide Web
      May 2013
      1636 pages
      ISBN:9781450320382
      DOI:10.1145/2487788

      Copyright © 2013 Copyright is held by the International World Wide Web Conference Committee (IW3C2).

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 13 May 2013

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • demonstration

      Acceptance Rates

      WWW '13 Companion Paper Acceptance Rate831of1,250submissions,66%Overall Acceptance Rate1,899of8,196submissions,23%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader