skip to main content
10.1145/2467696.2467721acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
research-article

Reading the correct history?: modeling temporal intention in resource sharing

Published:22 July 2013Publication History

ABSTRACT

The web is trapped in the "perpetual now", and when users traverse from page to page, they are seeing the state of the web resource (i.e., the page) as it exists at the time of the click and not necessarily at the time when the link was made. Thus, a temporal discrepancy can arise between the resource at the time the page author created a link to it and the time when a reader follows the link. This is especially important in the context of social media: the ease of sharing links in a tweet or Facebook post allows many people to author web content, but the space constraints combined with poor awareness by authors often prevents sufficient context from being generated to determine the intent of the post. If the links are clicked as soon as they are shared, the temporal distance between sharing and clicking is so small that there is little to no difference in content. However, not all clicks occur immediately, and a delay of days or even hours can result in reading something other than what the author intended. We introduce the concept of a user's temporal intention upon publishing a link in social media. We investigate the features that could be extracted from the post, the linked resource, and the patterns of social dissemination to model this user intention. Finally, we analyze the historical integrity of the shared resources in social media across time. In other words, how much is the knowledge of the author's intent beneficial in maintaining the consistency of the story being told through social posts and in enriching the archived content coverage and depth of vulnerable resources?

References

  1. E. Adar, J. Teevan, S. T. Dumais, and J. L. Elsas. The web changes everything: understanding the dynamics of web content. In WSDM'09: Proceedings of the Second ACM International Conference on Web Search and Data Mining, pages 282--291, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Antoniades, I. Polakis, G. Kontaxis, E. Athanasopoulos, S. Ioannidis, E. Markatos, and T. Karagiannis. we. b: The web of short urls. In Proceedings of the 20th international conference on World Wide Web, pages 715--724, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Z. Bar-Yossef, A. Z. Broder, R. Kumar, and A. Tomkins. Sic transit gloria telae: towards an understanding of the web's decay. In Proceedings of the 13th international conference on World Wide Web, WWW'04, pages 328--337, New York, NY, USA, 2004. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Ben Saad and S. Gançarski. Archiving the Web using Page Changes Pattern: A Case Study. In JCDL'11: Proceedings of ACM/IEEE Joint Conference on Digital Libraries, Ottawa, Canada, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Bermingham and A. F. Smeaton. On using twitter to monitor political sentiment and predict election results.Google ScholarGoogle Scholar
  6. J. Bollen, H. Mao, and X.-J. Zeng. Twitter mood predicts the stock market. abs/1010.3003, 2010.Google ScholarGoogle Scholar
  7. M. S. Charikar. Similarity estimation techniques from rounding algorithms. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, STOC'02, pages 380--388, New York, NY, USA, 2002. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Z. Chen, F. Lin, H. Liu, Y. Liu, W.-Y. Ma, and L. Wenyin. User intention modeling in web applications using data mining. World Wide Web, 5(3):181--191, Nov. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Cho and H. Garcia-Molina. Estimating frequency of change. ACM Transactions on Internet Technology, 3(3):256--290, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. N. Dai and B. D. Davison. Vetting the links of the web. In Proceedings of the 18th ACM conference on Information and knowledge management, CIKM'09, pages 1745--1748, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Z. Dalal, S. Dash, P. Dave, L. Francisco-Revilla, R. Furuta, U. Karadkar, and F. Shipman. Managing distributed collections: evaluating web page changes, movement, and replacement. In JCDL'04: Proceedings of the 4th ACM/IEEE-CS Joint Conference on Digital Libraries, pages 160--168, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. L. Elsas and S. T. Dumais. Leveraging temporal dynamics of document content in relevance ranking. In Proceedings of the third ACM international conference on Web search and data mining, WSDM'10, pages 1--10, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Facebook.com. Facebook official fact sheet. http://newsroom.fb.com/content/default.aspx?NewsAreaId=22, 2012. {Online; accessed 17-December-2012}.Google ScholarGoogle Scholar
  14. B. J. Jansen, D. L. Booth, and A. Spink. Determining the user intent of web search engine queries. In Proceedings of the 16th international conference on World Wide Web, WWW'07, pages 1149--1150, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Java, X. Song, T. Finin, and B. Tseng. Why we twitter: understanding microblogging usage and communities. In Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis, WebKDD/SNA-KDD'07, pages 56--65, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. V. Jethava, L. Calderón-Benavides, R. Baeza-Yates, C. Bhattacharyya, and D. Dubhashi. Scalable multi-dimensional user intent identification using tree structured distributions. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, SIGIR'11, pages 395--404, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. B. Kahle. Preserving the Internet. Scientific American, 276(3):82--83, March 1997.Google ScholarGoogle ScholarCross RefCross Ref
  18. A. Kathuria, B. J. Jansen, C. Hafernik, and A. Spink. Classifying the user intent of web queries using k-means clustering. Internet Research, 20(5):563--581, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  19. M. Klein. Using the Web Infrastructure for Real Time Recovery of Missing Web Pages. PhD thesis, Old Dominion University Department of Computer Science, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Klein and M. L. Nelson. Revisiting lexical signatures to (re-)discover web pages. In Proceedings of the 12th European conference on Research and Advanced Technology for Digital Libraries, ECDL'08, pages 371--382, Berlin, Heidelberg, 2008. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Klein and M. L. Nelson. Find, new, copy, web, page - tagging for the (re-)discovery of web pages. In Proceedings of TPDL, pages 27--39, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Klein, J. L. Shipman, and M. L. Nelson. Is This a Good Title? In HT'10: Proceedings of the 21st ACM Conference on Hypertext and Hypermedia, pages 3--12, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Klein, J. Ware, and M. L. Nelson. Rediscovering missing web pages using link neighborhood lexical signatures. In Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries, JCDL'11, pages 137--140, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. C. Kohlschütter, P. Fankhauser, and W. Nejdl. Boilerplate detection using shallow text features. In Proceedings of the third ACM international conference on Web search and data mining, WSDM'10, pages 441--450, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. H. Lee and X. Hu. Generating ground truth for music mood classification using mechanical turk. In Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries, JCDL'12, pages 129--138, New York, NY, USA, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. X. Li, Y.-Y. Wang, and A. Acero. Learning query intent from regularized click graphs. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR'08, pages 339--346, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. E. Loper and S. Bird. Nltk: the natural language toolkit. In Proceedings of the ACL-02 Workshop on Effective tools and methodologies for teaching natural language processing and computational linguistics - Volume 1, ETMTNLP'02, pages 63--70, Stroudsburg, PA, USA, 2002. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Mogadala and V. Varma. Twitter user behavior understanding with mood transition prediction. In Proceedings of the 2012 workshop on Data-driven user behavioral modelling and mining from social media, DUBMMSM'12, pages 31--34, New York, NY, USA, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. L. Nelson and B. D. Allen. Object persistence and availability in digital libraries. D-Lib Magazine, 8(1), 2002.Google ScholarGoogle ScholarCross RefCross Ref
  30. M. E. J. Newman and J. Park. Why social networks are different from other types of networks. Physical Review E, 68(3):036122+, sep 2003.Google ScholarGoogle Scholar
  31. A. Ntoulas, J. Cho, and C. Olston. What's new on the web?: the evolution of the web from a search engine perspective. In WWW'04: Proceedings of the 13th international Conference on World Wide Web, pages 1--12, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. H. M. SalahEldeen. Losing my revolution: A year after the egyptian revolution, 10% of the social media documentation is gone. http://ws-dl.blogspot.com/2012/02/2012-02--11-losing-my-revolution-year.html, 2012.Google ScholarGoogle Scholar
  33. H. M. SalahEldeen and M. L. Nelson. Losing my revolution: How much social media content has been lost? In Proceedings of TPDL, pages 125--137, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. R. Sanderson, M. Phillips, and H. Van de Sompel. Analyzing the persistence of referenced web resources with Memento. In Proceedings of Open Repositories 2011, 2011.Google ScholarGoogle Scholar
  35. R. L. Santos, C. Macdonald, and I. Ounis. Intent-aware search result diversification. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, SIGIR'11, pages 595--604, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Y. Tian and J. Zhu. Learning from crowds in the presence of schools of thought. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD'12, pages 226--234, New York, NY, USA, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Twitter.com. Twitter numbers. http://blog.Twitter.com/2011/03/numbers.html, 2012. {Online; accessed 17-December-2012}.Google ScholarGoogle Scholar
  38. H. Van de Sompel, M. L. Nelson, R. Sanderson, L. L. Balakireva, S. Ainsworth, and H. Shankar. Memento: Time Travel for the Web. Technical Report arXiv:0911.1112, 2009.Google ScholarGoogle Scholar
  39. M. Wu, R. C. Miller, and G. Little. Web wallet: preventing phishing attacks by revealing user intentions. In Proceedings of the second symposium on Usable privacy and security, SOUPS'06, pages 102--113, New York, NY, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Reading the correct history?: modeling temporal intention in resource sharing

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            JCDL '13: Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
            July 2013
            480 pages
            ISBN:9781450320771
            DOI:10.1145/2467696

            Copyright © 2013 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 22 July 2013

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            JCDL '13 Paper Acceptance Rate28of95submissions,29%Overall Acceptance Rate415of1,482submissions,28%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader