skip to main content
10.1145/2682571.2797073acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
short-paper

Filling the Gaps: Improving Wikipedia Stubs

Published:08 September 2015Publication History

ABSTRACT

The availability of only a limited number of contributors on Wikipedia cannot ensure consistent growth and improvement of the online encyclopedia. With information being scattered on the web, our goal is to automate the process of generation of content for Wikipedia. In this work, we propose a technique of improving stubs on Wikipedia that do not contain comprehensive information. A classifier learns features from the existing comprehensive articles on Wikipedia and recommends content that can be added to the stubs to improve the completeness of such stubs. We conduct experiments using several classifiers - Latent Dirichlet Allocation (LDA) based model, a deep learning based architecture (Deep belief network) and TFIDF based classifier. Our experiments reveal that the LDA based model outperforms the other models (~6% F-score). Our generation approach shows that this technique is capable of generating comprehensive articles. ROUGE-2 scores of the articles generated by our system outperform the articles generated using the baseline. Content generated by our system has been appended to several stubs and successfully retained in Wikipedia.

References

  1. S. Banerjee, C. Caragea, and P. Mitra. Playscript classification and automatic wikipedia play articles generation. In Proceedings of the 22nd International Conference on Pattern Recognition (ICPR), pages 3630--3635. IEEE, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Banerjee and P. Mitra. Wikikreator: Improving wikipedia stubs automatically. In Proceedings of the Joint Conference of the 53rd Annual Meeting of the ACL and the 7th International Joint Conference on Natural Language Processing of the AFNLP. Association for Computational Linguistics, 2015.Google ScholarGoogle Scholar
  3. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. the Journal of machine Learning research, 3:993--1022, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Y.-l. Boureau, Y. L. Cun, et al. Sparse feature learning for deep belief networks. In Advances in neural information processing systems, pages 1185--1192, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Clarke and M. Lapata. Global inference for sentence compression: An integer linear programming approach. J. Artif. Intell. Res.(JAIR), 31:399--429, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. G. Erkan and D. R. Radev. Lexrank: Graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res.(JAIR), 22(1):457--479, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Kohlschütter, P. Fankhauser, and W. Nejdl. Boilerplate detection using shallow text features. In Proceedings of the third ACM international conference on Web search and data mining, pages 441--450. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Q. Le and T. Mikolov. Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning (ICML-14), pages 1188--1196, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. P. Li, Y. Wang, and J. Jiang. Automatically building templates for entity summary construction. Information Processing & Management, 49(1):330--340, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C.-Y. Lin. Rouge: A package for automatic evaluation of summaries. In Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, pages 74--81, 2004.Google ScholarGoogle Scholar
  11. A. Nenkova, S. Maskey, and Y. Liu. Automatic summarization. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts of ACL 2011, page 3. Association for Computational Linguistics, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Sauper and R. Barzilay. Automatically generating wikipedia articles: A structure-aware approach. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1-Volume 1, pages 208--216. Association for Computational Linguistics, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. I. H. Witten, G. W. Paynter, E. Frank, C. Gutwin, and C. G. Nevill-Manning. Kea: Practical automatic keyphrase extraction. In Proceedings of the fourth ACM conference on Digital libraries, pages 254--255. ACM, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. Yao, X. Jia, S. Shou, S. Feng, F. Zhou, and H. Liu. Autopedia: automatic domain-independent wikipedia article generation. In Proceedings of the 20th international conference companion on World wide web, pages 161--162. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Filling the Gaps: Improving Wikipedia Stubs

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      DocEng '15: Proceedings of the 2015 ACM Symposium on Document Engineering
      September 2015
      248 pages
      ISBN:9781450333078
      DOI:10.1145/2682571

      Copyright © 2015 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 8 September 2015

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper

      Acceptance Rates

      DocEng '15 Paper Acceptance Rate11of31submissions,35%Overall Acceptance Rate178of537submissions,33%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader