skip to main content
10.1145/3105831.3105866acmotherconferencesArticle/Chapter ViewAbstractPublication PagesideasConference Proceedingsconference-collections
research-article

Quantification of time in Digital Libraries: Temporal Zipf's law

Published: 12 July 2017 Publication History

Abstract

The temporal dimension of a text document defines the temporal scope of its narrated event. This temporal dimension acquires more importance in corpora created along several years of production, such as digital libraries. Temporal aspects of text have been the subject of many researches with specific tasks, notably information retrieval and event detection, while no studies have been conducted to quantify and analyze the richness of the temporal dimension of different text collections. Analysing thirteen text collections we show how the extent and characteristics of the time presence in text varies among collections that have different scopes, although time intervals are mentioned in almost all the text units analyzed. We found that unique intervals follow the same distribution, given by the Zipf's law, that holds for single words.

References

[1]
Omar Alonso, Ricardo Baeza-Yates, and Michael Gertz. 2009. Effectiveness of temporal snippets. In WSSP Workshop at the World Wide Web ConferenceWWW, Vol. 9.
[2]
Omar Alonso, Michael Gertz, and Ricardo Baeza-Yates. 2007. On the value of temporal information in information retrieval. ACM SIGIR Forum 41, 2 (Dec. 2007), 35--41.
[3]
Ching-man Au Yeung and Adam Jatowt. 2011. Studying how the past is remembered: towards computational history through large scale text mining. In Proceedings of the 20th ACM international conference on Information and knowledge management. ACM, 1231--1240.
[4]
Ricardo Baeza-Yates and Felipe Saint-Jean. 2003. A Three Level Search Engine Index Based in Query Log Distribution. Springer Berlin Heidelberg, Berlin, Heidelberg, 56--65. https://doi.org/10.1007/978-3-540-39984-1_5
[5]
Klaus Berberich, Srikanta Bedathur, Omar Alonso, and Gerhard Weikum. 2010. A Language Modeling Approach for Temporal Information Needs. In Advances in Information Retrieval. Springer Berlin Heidelberg, Berlin, Heidelberg, 13--25.
[6]
Matteo Brucato and Danilo Montesi. 2014. Metric Spaces for Temporal Information Retrieval. In Advances in Information Retrieval. Lecture Notes in Computer Science, Vol. 8416. Springer International Publishing, 385--397. https://doi.org/10.1007/978-3-319-06028-6_32
[7]
Ricardo Campos, Gaël Dias, Alípio M Jorge, and Adam Jatowt. 2015. Survey of Temporal Information Retrieval and Related Applications. Comput. Surveys 47, 2 (Jan. 2015), 1--41.
[8]
A. X. Chang and C. D. Manning. 2012. SUTime: A library for recognizing and normalizing time expressions. LREC (2012).
[9]
Ye-Sho Chen and Pete Chong. 1992. Mathematical modeling of empirical laws in computer applications: A case study. Computers & Mathematics with Applications 24, 7 (1992), 77--87. https://doi.org/10.1016/0898-1221(92)90156-C
[10]
Aaron Clauset, Cosma Rohilla Shalizi, and M. E. J. Newman. 2009. Power-Law Distributions in Empirical Data. SIAM Rev. 51, 4 (2009), 661--703. https://doi.org/10.1137/070710111
[11]
William S Cleveland. 1979. Robust locally weighted regression and smoothing scatterplots. Journal of the American statistical association 74, 368 (1979), 829--836.
[12]
Andrew R. Conn, Nicholas I. M. Gould, and Philippe L. Toint. 2000. Trust-region Methods. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA.
[13]
James Cook, Atish Das Sarma, Alex Fabrikant, and Andrew Tomkins. 2012. Your Two Weeks of Fame and Your Grandmother's. In Proceedings of the 21st International Conference on World Wide Web (WWW '12). ACM, New York, NY, USA, 919--928. https://doi.org/10.1145/2187836.2187959
[14]
Miriam Fernández, Iván Cantador, Vanesa López, David Vallet, Pablo Castells, and Enrico Motta. 2011. Semantically enhanced information retrieval: An ontology-based approach. Web semantics: Science, services and agents on the world wide web 9, 4 (2011), 434--452.
[15]
Lisa Ferro, Laurie Gerber, Inderjeet Mani, Beth Sundheim, and George Wilson. 2005. TIDES 2005 standard for the annotation of temporal expressions. (2005).
[16]
Internet Memory Foundation. 2013. Internet Memory Foundation, Projects, LK. http://internetmemory.org/en/index.php/projects/livingknowledge. (2013). (Accessed on 10/03/2017).
[17]
Roberto Franzosi. Content analysis: Objective, systematic, and quantitative description of content. (????).
[18]
Roberto Franzosi. 2004. From words to numbers: Narrative, data, and social science. Vol. 22. Cambridge University Press.
[19]
Thomas Huet, Joanna Biega, and Fabian M. Suchanek. 2013. Mining History with Le Monde. In Proceedings of the 2013 Workshop on Automated Knowledge Base Construction (AKBC '13). ACM, New York, NY, USA, 49--54. https://doi.org/10.1145/2509558.2509567
[20]
The IEEE and The Open Group. 2016. General Concepts - Unix Timestamp. http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_15. (2016).
[21]
Wikimedia Foundation Inc. 2006. Wikipedia-n-zipf - Zipf's law - Wikipedia, the free encyclopedia. https://en.wikipedia.org/wiki/Zipf%27s_law#/media/File:Wikipedia-n-zipf.png. (2006).
[22]
Adam Jatowt, Émilien Antoine, Yukiko Kawai, and Toyokazu Akiyama. 2015. Mapping Temporal Horizons: Analysis of Collective Future and Past Related Attention in Twitter. In Proceedings of the 24th International Conference on World Wide Web (WWW '15). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 484--494.
[23]
Adam Jatowt and Ching-man Au Yeung. 2011. Extracting Collective Expectations About the Future from Large Text Collections. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM '11). ACM, New York, NY, USA, 1259--1264.
[24]
Gudlaugur Jóhannesson, Gunnlaugur Björnsson, and Einar H. Gudmundsson. 2006. Afterglow Light Curves and Broken Power Laws: A Statistical Study. The Astrophysical Journal Letters 640, 1 (2006), L5. http://stacks.iop.org/1538-4357/640/i=1/a=L5
[25]
Hideo Joho, Adam Jatowt, Roi Blanco, Hajime Naka, and Shuhei Yamamoto. 2014. Overview of ntcir-11 temporal information access (temporalia) task. In Proceedings of the NTCIR-11 Conference.
[26]
Rosie Jones and Fernando Diaz. 2007. Temporal profiles of queries. ACM Transactions on Information Systems (TOIS) 25, 3 (2007), 14.
[27]
Nattiya Kanhabua and Kjetil Nørvåg. 2012. Learning to rank search results for time-sensitive queries. In the 21st ACM international conference. ACM Press, New York, New York, USA, 2463--2466.
[28]
Zipf George Kingsley. 1932. Selective Studies and the Principle of Relative Frequency in Language. (1932).
[29]
András Kornai. 1999. Zipf's law outside the middle range. (1999).
[30]
Victor Lavrenko and W Bruce Croft. 2001. Relevance based language models. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 120--127.
[31]
Kenneth Levenberg. 1944. A method for the solution of certain non-linear problems in least squares. Quarterly of applied mathematics 2, 2 (1944), 164--168.
[32]
Inderjeet Mani and George Wilson. 2000. Robust Temporal Processing of News. In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics (ACL '00). Association for Computational Linguistics, Stroudsburg, PA, USA, 69--76.
[33]
Donald W Marquardt. 1963. An algorithm for least-squares estimation of nonlinear parameters. Journal of the society for Industrial and Applied Mathematics 11, 2 (1963), 431--441.
[34]
Sérgio Nunes. 2007. Exploring temporal evidence in web information retrieval. In Proceedings of the 1st BCS IRSG conference on Future Directions in Information Access. British Computer Society, 7--7.
[35]
J Pustejovsky, J M Castano, R Ingria, and R Sauri. 2003. TimeML: Robust Specification of Event and Temporal Expressions in Text. New directions in ... (2003).
[36]
James Pustejovsky and Amber Stubbs. 2012. Natural language annotation for machine learning. "O'Reilly Media, Inc.".
[37]
Jason Rennie. 2008. Home Page for 20 Newsgroups Data Set. http://qwone.com/~jason/20Newsgroups/. (2008). (Accessed on 10/03/2017).
[38]
Stephen Robertson. Understanding inverse document frequency: on theoretical arguments for IDF. Journal of Documentation 60, 5 (????), 503--520.
[39]
Stephen E Robertson, Steve Walker, Susan Jones, Micheline M Hancock-Beaulieu, Mike Gatford, et al. 1995. Okapi at TREC-3. NIST SPECIAL PUBLICATION SP 109 (1995), 109.
[40]
Tony Rose, Mark Stevenson, and Miles Whitehead. 2002. The Reuters Corpus Volume 1-from Yesterday's News to Tomorrow's Language Resources. In LREC, Vol. 2. 827--832.
[41]
Gerard Salton and Christopher Buckley. 1988. Term-weighting approaches in automatic text retrieval. Information processing & management 24, 5 (1988), 513--523.
[42]
Christer Samuelsson. 1996. Relating Turing's formula and Zipf's law. arXiv preprint cmp-lg/9606013 (1996).
[43]
Frank Schilder and Christopher Habel. 2001. From Temporal Expressions to Temporal Information: Semantic Tagging of News Messages. In Proceedings of the Workshop on Temporal and Spatial Information Processing - Volume 13 (TASIP '01). Association for Computational Linguistics, Stroudsburg, PA, USA, Article 9, 8 pages.
[44]
A. Setzer and R. J. Gaizauskas. 2000. Annotating Events and Temporal Information in Newswire Texts. LREC (2000).
[45]
Jannik Strötgen and Michael Gertz. 2010. HeidelTime: High quality rule-based extraction and normalization of temporal expressions. In Proceedings of the 5th International Workshop on Semantic Evaluation. Association for Computational Linguistics, 321--324.
[46]
Marc Verhagen, Robert Gaizauskas, Frank Schilder, Mark Hepple, Jessica Moszkowicz, and James Pustejovsky. 2009. The TempEval challenge: identifying temporal relations in text. Language Resources and Evaluation 43, 2 (June 2009), 161--179.
[47]
Dietmar Wolfram. 1992. Applying informetric characteristics of databases to ir system file design, part I: Informetric models. Information Processing & Management 28, 1 (1992), 121--133. https://doi.org/10.1016/0306-4573(92)90098-K

Cited By

View all
  • (2022)Diachronic Analysis of Time References in News ArticlesCompanion Proceedings of the Web Conference 202210.1145/3487553.3524671(918-923)Online publication date: 16-Aug-2022
  • (2020)Temporal Information AccessEvaluating Information Retrieval and Access Tasks10.1007/978-981-15-5554-1_9(127-141)Online publication date: 2-Sep-2020

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
IDEAS '17: Proceedings of the 21st International Database Engineering & Applications Symposium
July 2017
338 pages
ISBN:9781450352208
DOI:10.1145/3105831
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

In-Cooperation

  • Univ of the West of England: University of the West of England
  • BytePress
  • Concordia University: Concordia University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 July 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Quantification
  2. Temporal dimension
  3. Temporal expressions
  4. Text collections
  5. Zipf's law

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

IDEAS 2017

Acceptance Rates

IDEAS '17 Paper Acceptance Rate 38 of 102 submissions, 37%;
Overall Acceptance Rate 74 of 210 submissions, 35%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Diachronic Analysis of Time References in News ArticlesCompanion Proceedings of the Web Conference 202210.1145/3487553.3524671(918-923)Online publication date: 16-Aug-2022
  • (2020)Temporal Information AccessEvaluating Information Retrieval and Access Tasks10.1007/978-981-15-5554-1_9(127-141)Online publication date: 2-Sep-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media