Skip to main content

Extracting the Main Path of Historic Events from Wikipedia

  • Chapter
  • First Online:
Book cover Network Intelligence Meets User Centered Social Media Networks (ENIC 2017)

Part of the book series: Lecture Notes in Social Networks ((LNSN))

Included in the following conference series:

  • 385 Accesses

Abstract

The large online encyclopedia “Wikipedia” has become a valuable information resource. However, its large size and the interconnectedness of its pages can make it easy to get lost in detail and difficult to gain a good overview of a topic. As a solution we propose a procedure to extract, summarize, and visualize large categories of historic Wikipedia articles. At the heart of this procedure we apply the method of main path analysis—originally developed for citation networks—to a modified network of linked Wikipedia articles. Beside the aggregation method itself, we describe our data mining process of the Wikipedia datasets and the considerations that guided the visualization of the article networks. Finally, we present our web app that allows to experiment with the procedure on an arbitrary Wikipedia category.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.ti.inf.uni-due.de/research/tools/wikimainpath/.

  2. 2.

    http://histography.io/.

  3. 3.

    http://wiki.polyfra.me/.

  4. 4.

    https://en.wikipedia.org/wiki/Wiki_markup.

  5. 5.

    https://dumps.wikimedia.org/enwiki/.

  6. 6.

    http://boost-spirit.com/.

  7. 7.

    Around 2 h for extracting the dates in ∼130 GB of Wikipedia pages.

  8. 8.

    https://dumps.wikimedia.org/enwiki/20170501/.

  9. 9.

    Note that in the paper we use a different layout than for the web app.

References

  1. Agarwal, P., Strötgen, J.: Tiwiki: searching wikipedia with temporal constraints. In: Proceedings of the 26th International Conference on World Wide Web Companion, Perth, April 3–7, 2017, pp. 1595–1600 (2017)

    Google Scholar 

  2. Auer, S., et al.: DBpedia: a nucleus for a web of open data. In: The Semantic Web, 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, ISWC 2007 +  ASWC 2007, Busan, November 11–15, 2007, pp. 722–735 (2007)

    Chapter  Google Scholar 

  3. Batagelj, V.: Efficient algorithms for citation network analysis. In: CoRR cs.DL/0309023 (2003)

    Google Scholar 

  4. Bauer, S., Clark, S., Graepel, T.: Learning to identify historical figures for timeline creation from wikipedia articles. In: Social Informatics - SocInfo 2014 International Workshops, Barcelona, November 11, 2014, Revised Selected Papers, pp. 234–243 (2014)

    Google Scholar 

  5. Boukhelifa, N., Chevalier, F., Fekete, J.-D.: Real-time aggregation of wikipedia data for visual analytics. In: Proceedings of VAST ’10 (Visual Analytics Science and Technology), pp. 147–154. IEEE, New York (2010)

    Google Scholar 

  6. Chuang, T.C., et al.: The main paths of medical tourism: from transplantation to beautification. Tour. Manag. 45, 49–58 (2014)

    Article  Google Scholar 

  7. Halatchliyski, I., et al.: Analyzing the flow of ideas and profiles of contributors in an open learning community. In: Prof. of LAK ’13 (Conference on Learning Analytics and Knowledge), pp. 66–74 (2013)

    Google Scholar 

  8. Hienert, D., Luciano, F.: Extraction of historical events from wikipedia. In: The Semantic Web: ESWC 2012 Satellite Events - ESWC 2012 Satellite Events, Heraklion, Crete, May 27–31, 2012. Revised Selected Papers, pp. 16–28 (2012)

    Google Scholar 

  9. Hummon, N.P., Doreian, P.: Connectivity in a citation network: the development of DNA theory. Soc. Netw. 11(1), 39–63 (1989)

    Article  Google Scholar 

  10. Kobourov, S.G.: Spring embedders and force directed graph drawing algorithms. In: CoRR abs/1201.3011 (2012)

    Google Scholar 

  11. Kolomiyets, O., Bethard, S., Moens, M.-F.: Extracting narrative timelines as temporal dependency structures. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1, ACL ’12, pp. 88–97. Association for Computational Linguistics Jeju Island (2012)

    Google Scholar 

  12. Laparra, E., et al.: Multilingual and cross-lingual timeline extraction. In: CoRR abs/1702.00700 (2017)

    Article  Google Scholar 

  13. Liu, J.S., Lu, L.Y.Y.: An integrated approach for main path analysis: development of the Hirsch index as an example. J. Am. Soc. Inf. Sci. Technol. 63(3), 528–542 (2012)

    Article  Google Scholar 

  14. Vrandecic, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014)

    Article  Google Scholar 

Download references

Acknowledgements

We would like to thank Ulrich Hoppe and Stephanie Große for interesting and stimulating discussions on the topics of the paper. In addition we would like to thank Issai Zaks for his help with the servers and overall technical support. Finally, we have to thank our anonymous reviewers providing some very valuable feedback and missed references to related work. This work was supported by the Deutsche Forschungsgemeinschaft (DFG) under grant GRK 2167, Research Training Group “User-Centred Social Media.”

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Benjamin Cabrera .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Cabrera, B., König, B. (2018). Extracting the Main Path of Historic Events from Wikipedia. In: Alhajj, R., Hoppe, H., Hecking, T., Bródka, P., Kazienko, P. (eds) Network Intelligence Meets User Centered Social Media Networks. ENIC 2017. Lecture Notes in Social Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-90312-5_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-90312-5_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-90311-8

  • Online ISBN: 978-3-319-90312-5

  • eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics