ABSTRACT
Wikipedia is an open-content encyclopedia that receives billions of page views per month. It has been observed that in a single reading session, Wikipedia users visit multiple articles. To reduce the problems of overload and loss of information, there has been a growing interest in the research community to develop new approaches to present the only necessary information to the users. Automatically generation of personalized summaries is a proven remedy for the information overload problem. In this paper, we propose a technique to generate personalized summaries for Wikipedia articles by analyzing the reading patterns of users. To perform reading pattern analysis, we track eye gaze during the article reading session. Eye gaze analysis helps in identifying the attention distribution of a reader over an article. We extend the proposed approach to generate a summary for multiple articles visited during a user's Wikipedia reading session. We capture a dataset representing the reading pattern of Wikipedia users. We make this dataset publicly available for research community1.
- [n.d.]. CVC Eye Tracker. https://github.com/tiendan/OpenGazer Accessed: 2016.Google Scholar
- [n.d.]. NetGazer. http://sourceforge.net/projects/netgazer/ Accessed: 2016.Google Scholar
- Henny Admoni and Brian Scassellati. 2017. Social eye gaze in human-robot interaction: a review. Journal of Human-Robot Interaction 6, 1 (2017), 25--63.Google ScholarDigital Library
- Diego Antognini and Boi Faltings. 2019. Learning to Create Sentence Semantic Relation Graphs for Multi-Document Summarization. arXiv preprint arXiv:1909.12231 (2019).Google Scholar
- Diego Antognini and Boi Faltings. 2020. GameWikiSum: a Novel Large Multi-Document Summarization Dataset. arXiv preprint arXiv:2002.06851 (2020).Google Scholar
- Shlomo Berkovsky, Timothy Baldwin, and Ingrid Zukerman. 2008. Aspect-based personalized text summarization. In International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems. Springer, 267--270.Google ScholarDigital Library
- David Beymer and Daniel M Russell. 2005. WebGazeAnalyzer: a system for capturing and analyzing web reading behavior using eye gaze. In CHI'05 extended abstracts on Human factors in computing systems. ACM, 1913--1916.Google ScholarDigital Library
- Georg Buscher and Andreas Dengel. 2009. Gaze-based filtering of relevant document segments. In International World Wide Web Conference (WWW). 2024.Google Scholar
- Frans W Cornelissen, Enno M Peters, and John Palmer. 2002. The Eyelink Toolbox: eye tracking with MATLAB and the Psychophysics Toolbox. Behavior Research Methods, Instruments, & Computers 34, 4 (2002), 613--617.Google ScholarCross Ref
- Alberto Díaz, Pablo Gervás, and Antonio García. 2005. Evaluation of a System for Personalized Summarization of Web Contents. In User Modeling 2005. Springer Berlin Heidelberg, 453--462.Google Scholar
- Peter K Dunn, Margaret Marshman, and Robert McDougall. 2019. Evaluating Wikipedia as a self-learning resource for statistics: You know they'll use it. The American Statistician 73, 3 (2019), 224--231.Google ScholarCross Ref
- Nathan J Emery. 2000. The eyes have it: the neuroethology, function and evolution of social gaze. Neuroscience & Biobehavioral Reviews 24, 6 (2000), 581--604.Google ScholarCross Ref
- Gunes Erkan and Dragomir Radev. 2004. Lexpagerank: Prestige in multi-document text summarization. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 365--371.Google Scholar
- Günes Erkan and Dragomir R. Radev. 2004. LexRank: Graph-based Lexical Centrality as Salience in Text Summarization. ArXiv abs/1109.2128 (2004).Google Scholar
- Onur Ferhat, Fernando Vilarino, and Francisco Javier Sanchez. 2014. A cheap portable eye-tracker solution for common setups. (2014).Google Scholar
- Demian Gholipour Ghalandari, Chris Hokamp, Nghia The Pham, John Glover, and Georgiana Ifrim. 2020. A Large-Scale Multi-Document Summarization Dataset from the Wikipedia Current Events Portal. arXiv preprint arXiv:2005.10070 (2020).Google Scholar
- Jade Goldstein, Vibhu O Mittal, Jaime G Carbonell, and Mark Kantrowitz. 2000. Multi-document summarization by sentence extraction. In NAACL-ANLP 2000 Workshop: Automatic Summarization.Google ScholarDigital Library
- Alison Head and Michael Eisenberg. 2010. How today's college students use Wikipedia for course-related research. First Monday 15, 3 (2010).Google Scholar
- Denis Helic. 2012. Analyzing user click paths in a wikipedia navigation game. In 2012 Proceedings of the 35th International Convention MIPRO. IEEE, 374--379.Google Scholar
- Dharmendra Hingu, Deep Shah, and Sandeep S Udmale. 2015. Automatic text summarization of Wikipedia articles. In 2015 International Conference on Communication, Information & Computing Technology (ICCICT). IEEE, 1--4.Google ScholarCross Ref
- Heather Knight and Reid Simmons. 2013. Estimating human interest and attention via gaze analysis. In 2013 IEEE International Conference on Robotics and Automation. IEEE, 4350--4355.Google ScholarCross Ref
- Mahnaz Koupaee and William Yang Wang. 2018. Wikihow: A large scale text summarization dataset. arXiv preprint arXiv:1810.09305 (2018).Google Scholar
- Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out. 74--81.Google Scholar
- Peter J Liu, Mohammad Saleh, Etienne Pot, Ben Goodrich, Ryan Sepassi, Lukasz Kaiser, and Noam Shazeer. 2018. Generating wikipedia by summarizing long sequences. arXiv preprint arXiv:1801.10198 (2018).Google Scholar
- Yong Liu, Xiaolei Wang, Jin Zhang, and Hongbo Xu. 2008. Personalized PageRank based multi-document summarization. In IEEE International Workshop on Semantic Computing and Systems. IEEE, 169--173.Google ScholarDigital Library
- Róbert Móro et al. 2012. Personalized text summarization based on important terms identification. In 2012 23rd International Workshop on Database and Expert Systems Applications. IEEE, 131--135.Google Scholar
- EM Nel, DJC MacKay, P Zieliński, O Williams, and R Cipolla. 2012. Opengazer: open-source gaze tracker for ordinary webcams. (2012).Google Scholar
- Ani Nenkova and Lucy Vanderwende. [n.d.]. The impact of frequency on summarization. ([n. d.]).Google Scholar
- Ayano Okoso, Kai Kunze, and Koichi Kise. 2014. Implicit gaze based annotations to support second language learning. In Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication. 143--146.Google ScholarDigital Library
- Anneli Olsen. 2012. The Tobii I-VT fixation filter. Tobii Technology (2012).Google Scholar
- M Whitney Olsen and Anne R Diekema. 2012. "I just Wikipedia it": Information behavior of first-year writing students. Proceedings of the American Society for Information Science and Technology 49, 1 (2012), 1--11.Google ScholarCross Ref
- Alexandra Papoutsaki, James Laskey, and Jeff Huang. 2017. Searchgazer: Webcam eye tracking for remote studies of web search. In Proceedings of the 2017 Conference on Conference Human Information Interaction and Retrieval. 17--26.Google ScholarDigital Library
- Dragomir R Radev, Weiguo Fan, and Zhu Zhang. 2001. Webinessence: A personalized web-based multi-document summarization and recommendation system. In NAACL Workshop on Automatic Summarization. Citeseer.Google Scholar
- Krishnan Ramanathan, Yogesh Sankarasubramaniam, Nidhi Mathur, and Ajay Gupta. 2009. Document summarization using Wikipedia. In Proceedings of the first international conference on intelligent human computer interaction. Springer, 254--260.Google ScholarCross Ref
- Juan Ramos et al. 2003. Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning, Vol. 242. New Jersey, USA, 133--142.Google Scholar
- Eyal M Reingold and Keith Rayner. 2006. Examining the word identification stages hypothesized by the EZ Reader model. Psychological Science 17, 9 (2006), 742--746.Google ScholarCross Ref
- Gaetano Rossiello, Pierpaolo Basile, and Giovanni Semeraro. 2017. Centroid-based text summarization through compositionality of word embeddings. In Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres. 12--21.Google ScholarCross Ref
- Abigail See, Peter J Liu, and Christopher D Manning. 2017. Get to the point: Summarization with pointer-generator networks. arXiv preprint arXiv:1704.04368 (2017).Google Scholar
- HS Sichel. 1974. On a distribution representing sentence-length in written prose. Journal of the Royal Statistical Society: Series A (General) 137, 1 (1974), 25--34.Google ScholarCross Ref
- Sameer Singh, Amarnag Subramanya, Fernando Pereira, and Andrew McCallum. 2012. Wikilinks: A large-scale cross-document coreference corpus labeled via links to Wikipedia. University of Massachusetts, Amherst, Tech. Rep. UM-CS-2012 15 (2012).Google Scholar
- Taner Uçkan and Ali Karcı. 2020. Extractive multi-document text summarization based on graph independent sets. Egyptian Informatics Journal (2020).Google Scholar
- Wikimedia Statistics. 2019. Wikistats 2 - Statistics For Wikimedia Projects. https://stats.wikimedia.org/v2/#/en.wikipedia.org [Online; accessed 05-October-2019].Google Scholar
- Songhua Xu, Hao Jiang, and Francis Lau. 2009. User-oriented document summarization through vision-based eye-tracking. In Proceedings of the 14th international conference on Intelligent user interfaces. ACM, 7--16.Google ScholarDigital Library
- Petro Zdebskyi, Victoria Vysotska, Roman Peleshchak, Ivan Peleshchak, Andriy Demchuk, and Maksym Krylyshyn. 2019. An Application Development for Recognizing of View in Order to Control the Mouse Pointer.. In MoMLeT. 55--74.Google Scholar
- Wei Zhao, Maxime Peyrard, Fei Liu, Yang Gao, Christian M Meyer, and Steffen Eger. 2019. Moverscore: Text generation evaluating with contextualized embeddings and earth mover distance. arXiv preprint arXiv:1909.02622 (2019).Google Scholar
Index Terms
- WikiGaze: Gaze-based Personalized Summarization of Wikipedia Reading Session
Recommendations
Datasets and gate evaluation framework for benchmarking Wikipedia-based NER systems
NLP-DBPEDIA'13: Proceedings of the 2013th International Conference on NLP & DBpedia - Volume 1064We present a wikifier evaluation framework consisting of software support and two datasets (News and Tweets), which were derived from datasets previously published at WEKEX 2011 and MSM Challenge 2013. Entities recognized in the original datasets were ...
Learning to Map Wikidata Entities To Predefined Topics
WWW '19: Companion Proceedings of The 2019 World Wide Web ConferenceRecently much progress has been made in entity disambiguation and linking systems (EDL). Given a piece of text, EDL links words and phrases to entities in a knowledge base, where each entity defines a specific concept. Although extracted entities are ...
DAWT: Densely Annotated Wikipedia Texts Across Multiple Languages
WWW '17 Companion: Proceedings of the 26th International Conference on World Wide Web CompanionIn this work, we open up the DAWT dataset - Densely Annotated Wikipedia Texts across multiple languages. The annotations include labeled text mentions mapping to entities (represented by their Freebase machine ids) as well as the type of the entity. The ...
Comments