skip to main content
10.1145/1007568.1007586acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Identifying similarities, periodicities and bursts for online search queries

Published: 13 June 2004 Publication History

Abstract

We present several methods for mining knowledge from the query logs of the MSN search engine. Using the query logs, we build a time series for each query word or phrase (e.g., 'Thanksgiving' or 'Christmas gifts') where the elements of the time series are the number of times that a query is issued on a day. All of the methods we describe use sequences of this form and can be applied to time series data generally. Our primary goal is the discovery of semantically similar queries and we do so by identifying queries with similar demand patterns. Utilizing the best Fourier coefficients and the energy of the omitted components, we improve upon the state-of-the-art in time-series similarity matching. The extracted sequence features are then organized in an efficient metric tree index structure. We also demonstrate how to efficiently and accurately discover the important periods in a time-series. Finally we propose a simple but effective method for identification of bursts (long or short-term). Using the burst information extracted from a sequence, we are able to efficiently perform 'query-by-burst' on the database of time-series. We conclude the presentation with the description of a tool that uses the described methods, and serves as an interactive exploratory data discovery tool for the MSN query database.

References

[1]
R. Agrawal, C. Faloutsos, and A. Swami. Efficient Similarity Search in Sequence Databases. In Proc. of the 4th FODO, pages 69--84, Oct. 1993.
[2]
N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The r*-tree: An efficient and robust access method for points and rectangles. In Proc. of ACM SIGMOD, 1990.
[3]
T. Bozkaya and M.Özsoyoglu. Distance-based indexing for high-dimensional metric spaces. In Proc. of SIGMOD, 1997.
[4]
S. Brin. Near neighbor search in large metric spaces. In Proc. of 21th VLDB, 1995.
[5]
A. W. chee Fu, P. M. Chan, Y.-L. Cheung, and Y. Moon. Dynamic vp-tree indexing for n-nearest neighbor search given pair-wise distances. Journal of VLDB, 2000.
[6]
T. Chiueh. Content based image indexing. In Proc. of VLDB, 1994.
[7]
P. Ciaccia, M. Patella, and P. Zezula. M-tree: An efficient access method for similarity search in metric spaces. In Proc. of 23rd VLDB, pages 426--435, 1997.
[8]
J. Hellerstein, C. Papadimitriou, and E. Koutsoupias. Towards an analysis of indexing schemes. In Proc. of 16th ACM PODS, 1997.
[9]
E. Keogh. Exact indexing of dynamic time warping. In Proc. of VLDB, 2002.
[10]
E. Keogh, K. Chakrabarti, S. Mehrotra, and M. Pazzani. Locally adaptive dimensionality reduction for indexing large time series databases. In Proc. of ACM SIGMOD, pages 151--162, 2001.
[11]
J. Kleinberg. Bursty and hierarchical structure in streams. In Proc. of 8th SIGKDD, 2002.
[12]
A. Oppenheim, A. Willsky, and S. Nawab. Signals and Systems, 2nd Edition. Prentice Hall, 1997.
[13]
D. Rafiei and A. Mendelzon. Efficient retrieval of similar time sequences using dft. In Proc. of FODO, 1998.
[14]
C. Wang and X. S. Wang. Multilevel filtering for high dimensional nearest neighbor search. In ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 2000.
[15]
D. Wu, D. Agrawal, A. E. Abbadi, A. K. Singh, and T. R. Smith. Efficient retrieval for browsing large image databases. In Proc. of CIKM, pages 11--18, 1996.
[16]
P. Yianilos. Data structures and algorithms for nearest neighbor search in general metric spaces. In Proc. of 3rd SIAM on Discrete Algorithms, 1992.
[17]
Y. Zhu and D. Shasha. Efficient elastic burst detection in data streams. In Proc. of 9th SIGKDD, 2003.

Cited By

View all
  • (2025)Transport-Related Synthetic Time Series: Developing and Applying a Quality Assessment FrameworkSustainability10.3390/su1703121217:3(1212)Online publication date: 2-Feb-2025
  • (2025)Tela: A Temporal Load-Aware Cloud Virtual Disk Placement SchemeProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707252(1084-1100)Online publication date: 3-Feb-2025
  • (2024)Report on the Search Futures Workshop at ECIR 2024ACM SIGIR Forum10.1145/3687273.368728858:1(1-41)Online publication date: 7-Aug-2024
  • Show More Cited By
  1. Identifying similarities, periodicities and bursts for online search queries

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of data
    June 2004
    988 pages
    ISBN:1581138598
    DOI:10.1145/1007568
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 June 2004

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Article

    Conference

    SIGMOD/PODS04
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)35
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Transport-Related Synthetic Time Series: Developing and Applying a Quality Assessment FrameworkSustainability10.3390/su1703121217:3(1212)Online publication date: 2-Feb-2025
    • (2025)Tela: A Temporal Load-Aware Cloud Virtual Disk Placement SchemeProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707252(1084-1100)Online publication date: 3-Feb-2025
    • (2024)Report on the Search Futures Workshop at ECIR 2024ACM SIGIR Forum10.1145/3687273.368728858:1(1-41)Online publication date: 7-Aug-2024
    • (2023)BurstSketch: Finding Bursts in Data StreamsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.322368635:11(11126-11140)Online publication date: 1-Nov-2023
    • (2023)Robust Dominant Periodicity Detection for Time Series with Missing DataICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP49357.2023.10095241(1-5)Online publication date: 4-Jun-2023
    • (2023)Web MiningMachine Learning for Data Science Handbook10.1007/978-3-031-24628-9_20(447-467)Online publication date: 26-Feb-2023
    • (2022)A tail-tolerant cloud storage scheduling based on precise periodicity detectionCCF Transactions on High Performance Computing10.1007/s42514-022-00099-84:3(321-338)Online publication date: 23-May-2022
    • (2022)Time Series Modeling of Methane Gas in Underground MinesMining, Metallurgy & Exploration10.1007/s42461-022-00654-5Online publication date: 2-Aug-2022
    • (2021)RobustPeriod: Robust Time-Frequency Mining for Multiple Periodicity DetectionProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3452779(2328-2337)Online publication date: 9-Jun-2021
    • (2021)Event Occurrence Date Estimation based on Multivariate Time Series Analysis over Temporal Document CollectionsProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3462885(398-407)Online publication date: 11-Jul-2021
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media