skip to main content
10.1145/1557019.1557075acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

On burstiness-aware search for document sequences

Published: 28 June 2009 Publication History

Abstract

As the number and size of large timestamped collections (e.g. sequences of digitized newspapers, periodicals, blogs) increase, the problem of efficiently indexing and searching such data becomes more important. Term burstiness has been extensively researched as a mechanism to address event detection in the context of such collections. In this paper, we explore how burstiness information can be further utilized to enhance the search process. We present a novel approach to model the burstiness of a term, using discrepancy theory concepts. This allows us to build a parameter-free, linear-time approach to identify the time intervals of maximum burstiness for a given term. Finally, we describe the first burstiness-driven search framework and thoroughly evaluate our approach in the context of different scenarios.

Supplementary Material

JPG File (p477-lappas.jpg)
MP4 File (p477-lappas.mp4)

References

[1]
D. Agarwal, J. M. Phillips, and S. Venkatasubramanian. The hunting of the bump: on maximizing statistical discrepancy. In SODA '06, pages 1137--1146, New York.
[2]
N. Bansal and N. Koudas. Blogscope: a system for online analysis of high volume text streams. In VLDB '07.
[3]
N. Bansal and N. Koudas. BlogScope: spatio-temporal analysis of the blogosphere. In WWW '07.
[4]
B. Chazelle. The discrepancy method: randomness and complexity. Cambridge University Press, NY, 2000.
[5]
C.-H. Cheng, K.-Y. Chen, W.-C. Tien, and K.-M. Chao. Improved algorithmms for the k maximum-sums problems. Theor. Comput. Sci., 362(1):162--170, 2006.
[6]
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms, Second Edition. The MIT Press, September 2001.
[7]
D. P. Dobkin, D. Gunopulos, and W. Maass. Computing the maximum bichromatic discrepancy, with applications to computer graphics and machine learning. J. Comput. Syst. Sci., 52(3):453--470, 1996.
[8]
D. Dobkin and D. Eppstein. Computing the discrepancy. In SCG '93, pages 47--52, New York, NY, USA, 1993. ACM.
[9]
R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. In PODS '01.
[10]
G. P. C. Fung, J. X. Yu, P. S. Yu, and H. Lu. Parameter free bursty events detection in text streams. In VLDB '05.
[11]
Q. He, K. Chang, and E.-P. Lim. Analyzing feature trajectories for event detection. In SIGIR '07.
[12]
Q. He, K. Chang, E.-P. Lim, and J. Zhang. Bursty feature representation for clustering text streams. In SIAM '07.
[13]
J. Kleinberg. Bursty and hierarchical structure in streams. In KDD '02, pages 91--101, New York, USA.
[14]
R. Kumar, J. Novak, P. Raghavan, and A. Tomkins. On the bursty evolution of blogspace. In WWW '03.
[15]
National Digital Newspaper Program (NDNP), http://www.loc.gov/ndnp.
[16]
Qi He and Kuiyu Chang and Ee-Peng Lim. Using burstiness to improve clustering of topics in news streams. In ICDM '07, Washington, DC, USA, 2007.
[17]
W. L. Ruzzo and M. Tompa. A linear time algorithm for finding all maximal scoring subsequences. In ISMB 1999.
[18]
M. Vlachos, C. Meek, Z. Vagena, and D. Gunopulos. Identifying similarities, periodicities and bursts for online search queries. In SIGMOD '04, pages 131--142, New York.
[19]
Y. Zhu and D. Shasha. Efficient elastic burst detection in data streams. In KDD '03, pages 336--345, New York.

Cited By

View all
  • (2023)BurstSketch: Finding Bursts in Data StreamsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.322368635:11(11126-11140)Online publication date: 1-Nov-2023
  • (2023)Stigmergy in Open Collaboration: An Empirical Investigation Based on WikipediaJournal of Management Information Systems10.1080/07421222.2023.222911940:3(983-1008)Online publication date: 23-Aug-2023
  • (2022)Reverse spatial top-k keyword queriesThe VLDB Journal10.1007/s00778-022-00759-932:3(501-524)Online publication date: 25-Jul-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
June 2009
1426 pages
ISBN:9781605584959
DOI:10.1145/1557019
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 June 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. burstiness
  2. document sequences
  3. search

Qualifiers

  • Research-article

Conference

KDD09

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)3
Reflects downloads up to 18 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)BurstSketch: Finding Bursts in Data StreamsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.322368635:11(11126-11140)Online publication date: 1-Nov-2023
  • (2023)Stigmergy in Open Collaboration: An Empirical Investigation Based on WikipediaJournal of Management Information Systems10.1080/07421222.2023.222911940:3(983-1008)Online publication date: 23-Aug-2023
  • (2022)Reverse spatial top-k keyword queriesThe VLDB Journal10.1007/s00778-022-00759-932:3(501-524)Online publication date: 25-Jul-2022
  • (2021)Burstiness-Aware Web Search Analysis on Different Levels of EvidencesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.3109304(1-1)Online publication date: 2021
  • (2021)The automatic approach for scientific papers dating2021 Ivannikov Ispras Open Conference (ISPRAS)10.1109/ISPRAS53967.2021.00020(107-113)Online publication date: Dec-2021
  • (2020)Bursts of Activity: Temporal Patterns of Help-Seeking and Support in Online Mental Health ForumsProceedings of The Web Conference 202010.1145/3366423.3380056(2906-2912)Online publication date: 20-Apr-2020
  • (2020)Using Productive Collaboration Bursts to Analyze Open Source Collaboration Effectiveness2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER48275.2020.9054852(400-410)Online publication date: Feb-2020
  • (2020)The network-untangling problem: from interactions to activity timelinesData Mining and Knowledge Discovery10.1007/s10618-020-00717-5Online publication date: 3-Oct-2020
  • (2019)MVP: Finding the Most Valuable Posts in Financial Social Networks2019 IEEE 13th International Conference on Semantic Computing (ICSC)10.1109/ICOSC.2019.8665516(352-355)Online publication date: Jan-2019
  • (2019)Answering unique topic queries with dynamic thresholdWorld Wide Web10.1007/s11280-018-0528-722:1(39-58)Online publication date: 1-Jan-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media