Skip to main content
Log in

Leadership discovery when data correlatively evolve

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Nowadays, World Wide Web is full of rich information, including text data, XML data, multimedia data, time series data, etc. The web is usually represented as a large graph and PageRank is computed to rank the importance of web pages. In this paper, we study the problem of ranking evolving time series and discovering leaders from them by analyzing lead-lag relations. A time series is considered to be one of the leaders if its rise or fall impacts the behavior of many other time series. At each time point, we compute the lagged correlation between each pair of time series and model them in a graph. Then, the leadership rank is computed from the graph, which brings order to time series. Based on the leadership ranking, the leaders of time series are extracted. However, the problem poses great challenges since the dynamic nature of time series results in a highly evolving graph, in which the relationships between time series are modeled. We propose an efficient algorithm which is able to track the lagged correlation and compute the leaders incrementally, while still achieving good accuracy. Our experiments on real weather science data and stock data show that our algorithm is able to compute time series leaders efficiently in a real-time manner and the detected leaders demonstrate high predictive power on the event of general time series entities, which can enlighten both weather monitoring and financial risk control.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Langville, A.N., Meyer, C.D.: Google’s PageRank and Beyond: The Science of Search Engine Rankings. Princeton University Press (2006)

  2. Bhuyan, R.: Information, alternative markets, and security price processes: a survey of literature. Finance 0211002, EconWPA (2002)

  3. Box, G., Jenkins, G.M., Reinsel, G.: Time Series Analysis: Forecasting and Control. Prentice Hall (1994)

  4. Brennan, M.J., Jegadeesh, N., Swaminathan, B.: Investment analysis and the adjustment of stock prices to common information. Rev. Financ. Stud. 6(4), 799–824 (1993)

    Article  Google Scholar 

  5. Brent, R.P.: Algorithms for Minimization Without Derivatives. Dover Publications (2002)

  6. Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30(1–7), 107–117 (1998)

    Article  Google Scholar 

  7. Campbell, J.Y., Grossman, S.J., Wang, J.: Trading volume and serial correlation in stock returns. Q. J. Econ. 108(4), 905–939 (1993)

    Article  Google Scholar 

  8. Chan, K.: A further analysis of the lead-lag relationship between the cash market and stock index futures market. Rev. Financ. Stud. 5(1), 123–152 (1992)

    Article  Google Scholar 

  9. Corso, G.M.D., Gullí, A., Romani, F.: Ranking a stream of news. In: WWW ’05: Proceedings of the 14th International Conference on World Wide Web, pp. 97–106. ACM, New York (2005)

    Chapter  Google Scholar 

  10. Dorr, D.H., Denton, A.M.: Establishing relationships among patterns in stock market data. In: Data & Knowledge Engineering (2008)

  11. Douglis, F., Ball, T., Chen, Y.-F., Koutsofios, E.: The AT&T internet difference engine: tracking and viewing changes on the web. World Wide Web 1(1), 27–44 (1998)

    Article  Google Scholar 

  12. Granger, C.W.J.: Investigating causal relations by econometric models and cross-spectral methods. Econometrica 37(3), 424–38 (1969)

    Article  MathSciNet  Google Scholar 

  13. Greco, G., Greco, S., Zumpano, E.: A probabilistic approach for distillation and ranking of web pages. World Wide Web 4(3), 189–207 (2001)

    Article  MATH  Google Scholar 

  14. Gruhl, D., Guha, R., Kumar, R., Novak, J., Tomkins, A.: The predictive power of online chatter. In: KDD, pp. 78–87. ACM, New York (2005)

    Google Scholar 

  15. Gruhl, D., Guha, R., Liben-Nowell, D., Tomkins, A.: Information diffusion through blogspace. In: WWW, pp. 491–501. ACM, New York (2004)

    Chapter  Google Scholar 

  16. Idé, T., Kashima, H.: Eigenspace-based anomaly detection in computer systems. In: KDD, pp. 440–449 (2004)

  17. Idé, T., Papadimitriou, S., Vlachos, M.: Computing correlation anomaly scores using stochastic nearest neighbors. In: ICDM, pp. 523–528

  18. Kontaki, M., Papadopoulos, A.N., Manolopoulos, Y.: Continuous subspace clustering in streaming time series. Inf. Syst. 33(2), 240–260 (2008)

    Article  Google Scholar 

  19. Kumar, R., Novak, J., Raghavan, P., Tomkins, A.: On the bursty evolution of blogspace. World Wide Web 8(2), 159–178 (2005)

    Article  Google Scholar 

  20. Meijering, E.: Chronology of interpolation: From ancient astronomy to modern signal and image processing. In: Proc. of the IEEE, pp. 319–342 (2002)

  21. Nie, Z., Zhang, Y., Wen, J.-R., Ma, W.-Y.: Object-level ranking: bringing order to web objects. In: WWW, pp. 567–574 (2005)

  22. Papadimitriou, S., Sun, J., Yu, P.S.: Local correlation tracking in time series. In: ICDM, pp. 456–465 (2006)

  23. Pirolli, P., Pitkow, J.E.: Distributions of surfers’ paths through the world wide web: Empirical characterizations. World Wide Web 2(1–2), 29–45 (1999)

    Article  Google Scholar 

  24. Pitkow, J.E.: Summary of www characterizations. World Wide Web 2(1–2), 3–13 (1999)

    Article  Google Scholar 

  25. Säfvenblad, P.: Lead-lag effects when prices reveal cross-security information. Working Paper Series in Economics and Finance 189. Stockholm School of Economics (1997)

  26. Sakurai, Y., Papadimitriou, S., Faloutsos, C.: Braid: stream mining through group lag correlations. In: SIGMOD, pp. 599–610 (2005)

  27. Steinbach, M., Tan, P.-N., Kumar, V., Klooster, S.A., Potter, C.: Discovery of climate indices using clustering. In: KDD, pp. 446–455 (2003)

  28. Tan, P.-N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison-Wesley (2006)

  29. von Storch, H., Zwiers, F.W.: Statistical Analysis in Climate Research. Cambridge University Press (2002)

  30. Wang, Q., Megalooikonomou, V.: A dimensionality reduction technique for efficient time series similarity analysis. Inf. Syst. 33(1), 115–132 (2008)

    Article  Google Scholar 

  31. Wichard, J.D., Merkwirth, C., Ogorzałlek, M.: Detecting correlation in stock market. Physica, A 344(1–2), 308–311 (2004)

    Article  MathSciNet  Google Scholar 

  32. Wu, D., Ke, Y., Yu, J.X., Yu, P.S., Chen, L.: Detecting leaders from correlated time series. In: DASFAA, pp. 352–367 (2010)

  33. Zhu, Y., Shasha, D.: Statstream: statistical monitoring of thousands of data streams in real time. In: VLDB, pp. 358–369 (2002)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yiping Ke.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, D., Ke, Y., Yu, J.X. et al. Leadership discovery when data correlatively evolve. World Wide Web 14, 1–25 (2011). https://doi.org/10.1007/s11280-010-0095-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-010-0095-z

Keywords

Navigation