Skip to main content
Log in

Ecosystem on the Web: non-linear mining and forecasting of co-evolving online activities

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Given a large collection of co-evolving online activities, such as searches for the keywords “Xbox”, “PlayStation” and “Wii”, how can we find patterns and rules? Are these keywords related? If so, are they competing against each other? Can we forecast the volume of user activity for the coming month? We conjecture that online activities compete for user attention in the same way that species in an ecosystem compete for food. We present EcoWeb, (i.e., Ecosystem on the Web), which is an intuitive model designed as a non-linear dynamical system for mining large-scale co-evolving online activities. Our second contribution is a novel, parameter-free, and scalable fitting algorithm, EcoWeb-Fit, that estimates the parameters of EcoWeb. Extensive experiments on real data show that EcoWeb is effective, in that it can capture long-range dynamics and meaningful patterns such as seasonalities, and practical, in that it can provide accurate long-range forecasts. EcoWeb consistently outperforms existing methods in terms of both accuracy and execution speed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12

Similar content being viewed by others

Notes

  1. Available at http://www.cs.kumamoto-u.ac.jp/~yasuko/software.html

  2. http://www.google.com/trends/

  3. Image courtesy of xura, criminalatt, David Castillo Dominici, happykanppy at FreeDigitalPhotos.net.

  4. There are several variations of the Lotka-Volterra model, e.g., the predator-prey/parasitism model. However, in this paper, we only focus on the simplest case where a i j ≥ 0(ij) for all species i and j (i.e., neutralism/amensalism/competition).

  5. For example, given N users, there are N × 24 hours/resources per day, or fewer, depending on the keyword and the demographic group it appeals to.

  6. In this paper, we assume that P(t) is the popularity density of a keyword, i.e., 0≤P(t)≤1, however, our equations can also handle other settings, such as the actual numbers of keyword appearances.

  7. We can also say: the amount of available user resources for keyword i with a limited size of maximum popularity size K i is: \( K_{i} - {\sum }_{j=1}^{d} a_{ij} P_{j}(t). \)

  8. Here, \(\log ^{*}\) is the universal code length for integers.

  9. We digitize the floating number into c F = 8 bits.

  10. Here, μ, σ 2 need 2c F bits, but we can eliminate them because they are constant values and independent of our modeling.

References

  1. Aggarwal, C.C.: The Setwise Stream Classification Problem. In: KDD, pp 432–441 (2014)

  2. Anderson, R.M., May, R.M.: Infectious Diseases of Humans Dynamics and Control. Oxford University Press (1992). http://www.oup.com/uk/catalogue/?ci=9780198540403

  3. Beutel, A., Prakash, B.A., Rosenfeld, R., Faloutsos, C.: Interacting Viruses in Networks: Can Both Survive?. In: KDD, pp. 426–434 (2012)

  4. Böhm, C., Faloutsos, C., Pan, J.Y., Plant, C.: Ric: Parameter-free noise-robust clustering. TKDD 1(3) (2007)

  5. Böhm, C., Faloutsos, C., Plant, C.: Outlier-Robust Clustering Using Independent Components. In: SIGMOD, pp. 185–198 (2008)

  6. Box, G.E., Jenkins, G.M., Reinsel, G.C.: Time Series Analysis: Forecasting and Control, 3rd edn. Prentice Hall, Englewood Cliffs, NJ (1994)

  7. Brauer, F., Castillo-Chavez, C.: Mathematical Models in Population Biology and Epidemiology, vol. 40. Springer Verlag, New York (2001)

    Book  MATH  Google Scholar 

  8. Chakrabarti, D., Papadimitriou, S., Modha, D.S., Faloutsos, C.: Fully Automatic Cross-Associations. In: KDD, pp. 79–88 (2004)

  9. Choi, H., Varian, H.R.: Predicting the present with google trends. Econ. Rec. 88(s1), 2–9 (2012)

    Article  Google Scholar 

  10. Davidson, I.N., Gilpin, S., Carmichael, O.T., Walker, P.B.: Network Discovery via Constrained Tensor Analysis of Fmri Data. In: KDD, pp. 194–202 (2013)

  11. Eirinaki, M., Vazirgiannis, M.: Web mining for Web personalization. ACM Trans. Internet Techn. 3(1), 1–27 (2003)

    Article  Google Scholar 

  12. Ferlez, J., Faloutsos, C., Leskovec, J., Mladenic, D., Grobelnik, M.: Monitoring network evolution using MDL. In: ICDE, pp. 1328–1330

  13. Figueiredo, F., Almeida, J.M., Matsubara, Y., Ribeiro, B., Faloutsos, C.: Revisit Behavior in Social Media: The Phoenix-R Model and Discoveries. In: PKDD, pp. 386–401 (2014)

  14. Fujiwara, Y., Sakurai, Y., Yamamuro, M.: Spiral: Efficient and Exact Model Identification for Hidden Markov Models. In: KDD, pp. 247–255 (2008)

  15. Ginsberg, J., Mohebbi, M., Patel, R., Brammer, L., Smolinski, M., Brilliant, L.: Detecting influenza epidemics using search engine query data. Nature 457, 1012–1014 (2009)

    Article  Google Scholar 

  16. Goel, S., Hofman, J., Lahaie, S., Pennock, D., Watts, D.: Predicting consumer behavior with Web search PNAS (2010)

  17. Gruhl, D., Guha, R., Kumar, R., Novak, J., Tomkins, A.: The Predictive Power of Online Chatter. In: KDD, pp. 78–87 (2005)

  18. Hyvärinen, A., Oja, E.: Independent component analysis: Algorithms and applications. Neural Netw. 13(4-5), 411–430 (2000)

    Article  Google Scholar 

  19. Jackson, E.: Perspectives of nonlinear dynamics: Cambridge university press (1992)

  20. Jain, A., Chang, E.Y., Wang, Y.F.: Adaptive stream resource management using kalman filters. In: SIGMOD. doi:10.1145/1007568.1007573, pp 11–22 (2004)

  21. Jolliffe, I.: Principal component analysis springer verlag (1986)

  22. Lourenco, R. Jr., Veloso, A., Pereira, A., Meira, W. Jr., Ferreira, R., Parthasarathy, S.: Economically-Efficient Sentiment Stream Analysis. In: SIGIR, pp. 637–646 (2014)

  23. Keogh, E.J., Chu, S., Hart, D., Pazzani, M.J.: An Online Algorithm for Segmenting Time Series. In: ICDM, pp. 289–296 (2001)

  24. Koren, Y.: Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model. In: KDD, pp. 426–434 (2008)

  25. Korn, F., Jagadish, H.V., Faloutsos, C.: Efficiently Supporting Ad Hoc Queries in Large Datasets of Time Sequences. In: SIGMOD 1997, pp. 289–300 (1997)

  26. Kumar, R., Mahdian, M., McGlohon, M.: Dynamics of Conversations. In: KDD, pp. 553–562 (2010)

  27. Lee, J.G., Han, J., Whang, K.Y.: Trajectory Clustering: a Partition-And-Group Framework. In: SIGMOD, pp. 593–604 (2007)

  28. Lee, W., Leung, C.K., Lee, J.J.: Mobile Web navigation in digital ecosystems using rooted directed trees. IEEE Trans. Ind. Electron. 58(6), 2154–2162 (2011)

    Article  Google Scholar 

  29. Leontief, W.: Input-output economics. Oxford University Press (1986)

  30. Leskovec, J., Backstrom, L., Kleinberg, J.M.: Meme-Tracking and the Dynamics of the News Cycle. In: KDD, pp. 497–506 (2009)

  31. Leskovec, J., Backstrom, L., Kumar, R., Tomkins, A.: Microscopic Evolution of Social Networks. In: KDD, pp. 462–470 (2008)

  32. Levenberg, K.: A method for the solution of certain non-linear problems in least squares. Q. J. Appl. Math. II(2), 164–168 (1944)

    Article  MathSciNet  MATH  Google Scholar 

  33. Li, L., Liang, C.J.M., Liu, J., Nath, S., Terzis, A., Faloutsos, C.: Thermocast: a Cyber-Physical Forecasting Model for Data Centers. In: KDD (2011)

  34. Li, L., Prakash, B.A., Faloutsos, C.: Parsimonious linear fingerprinting for time series. PVLDB 3(1), 385–396 (2010)

    Google Scholar 

  35. Lu, Y., Tsaparas, P., Ntoulas, A., Polanyi, L.: Exploiting Social Context for Review Quality Prediction. In: WWW, pp. 691–700 (2010)

  36. Matsubara, Y., Sakurai, Y., Faloutsos, C.: Autoplait: Automatic Mining of Co-Evolving Time Sequences. In: SIGMOD, pp. 193–204 (2014)

  37. Matsubara, Y., Sakurai, Y., Faloutsos, C.: The Web as a Jungle: Non-Linear Dynamical Systems for Co-Evolving Online Activities. In: WWW, pp. 721–731 (2015)

  38. Matsubara, Y., Sakurai, Y., Faloutsos, C.: Non-Linear Mining of Competing Local Activities. In: WWW (2016)

  39. Matsubara, Y., Sakurai, Y., Faloutsos, C., Iwata, T., Yoshikawa, M.: Fast Mining and Forecasting of Complex Time-Stamped Events. In: KDD, pp. 271–279 (2012)

  40. Matsubara, Y., Sakurai, Y., van Panhuis, W.G., Faloutsos, C.: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics. In: KDD, pp. 105–114 (2014)

  41. Matsubara, Y., Sakurai, Y., Prakash, B.A., Li, L., Faloutsos, C.: Rise and Fall Patterns of Information Diffusion: Model and Implications. In: KDD, pp. 6–14 (2012)

  42. Matsubara, Y., Sakurai, Y., Ueda, N., Yoshikawa, M.: Fast and Exact Monitoring of Co-Evolving Data Streams. In: ICDM, pp. 390–399 (2014)

  43. Matsubara, Y., Sakurai, Y., Yoshikawa, M.: Scalable Algorithms for Distribution Search. In: ICDM, pp. 347–356 (2009)

  44. Matsubara, Y., Sakurai, Y., Yoshikawa, M.: D-search: an efficient and exact search algorithm for large distribution sets. Knowl. Inf. Syst. 29(1), 131–157 (2011)

    Article  Google Scholar 

  45. May, R.M.: Qualitative stability in model ecosystems. Ecology 54(3), 638–641 (1973)

    Article  Google Scholar 

  46. Murray, J.: Mathematical Biology II: Spatial Models and Biomedical Applications. Intercisciplinary Applied Mathematics: Mathematical Biology. Springer (2003)

  47. Nowak, M.: Evolutionary dynamics. Harvard University Press (2006)

  48. Odum, E., Barrett, G.: Fundamentals of ecology thomson Brooks/Cole (2005)

  49. Papadimitriou, S., Brockwell, A., Faloutsos, C.: Adaptive, Hands-Off Stream Mining. In: VLDB, pp. 560–571 (2003)

  50. Papadimitriou, S., Sun, J., Faloutsos, C.: Streaming Pattern Discovery in Multiple Time-Series. In: VLDB, pp. 697–708 (2005)

  51. Papadimitriou, S., Yu, P.S.: Optimal Multi-Scale Patterns in Time Series Streams. In: SIGMOD, pp. 647–658 (2006)

  52. Prakash, B.A., Beutel, A., Rosenfeld, R., Faloutsos, C.: Winner Takes All: Competing Viruses Or Ideas on Fair-Play Networks. In: WWW, pp. 1037–1046 (2012)

  53. Prakash, B.A., Chakrabarti, D., Faloutsos, M., Valler, N., Faloutsos, C.: Threshold Conditions for Arbitrary Cascade Models on Arbitrary Networks. In: ICDM, pp. 537–546 (2011)

  54. Preis, T., Moat, H.S., Stanley, H.E.: Quantifying trading behavior in financial markets using google trends. Sci. Rep. 3 (2013)

  55. Rakthanmanon, T., Campana, B.J.L., Mueen, A., Batista, G.E.A.P.A., Westover, M.B., Zhu, Q., Zakaria, J., Keogh, E.J.: Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping. In: KDD, pp. 262–270 (2012)

  56. Ribeiro, B.: Modeling and Predicting the Growth and Death of Membership-Based Websites. In: WWW, pp. 653–664 (2014)

  57. Sakurai, Y., Faloutsos, C., Yamamuro, M.: Stream Monitoring under the Time Warping Distance. Istanbul, Turkey (2007)

    Book  Google Scholar 

  58. Sakurai, Y., Li, L., Matsubara, Y., Faloutsos, C.: Windmine: Fast and Effective Mining of Web-Click Sequences. In: SDM, pp. 759–770 (2011)

  59. Sakurai, Y., Matsubara, Y., Faloutsos, C.: Mining and Forecasting of Big Time-Series Data. In: SIGMOD, pp. 919–922. Tutorial (2015)

  60. Sakurai, Y., Matsubara, Y., Faloutsos, C.: Mining Big Time-Series Data on the Web. In: WWW. Tutorial (2016)

  61. Sakurai, Y., Papadimitriou, S., Faloutsos, C.: Braid: Stream Mining through Group Lag Correlations. In: SIGMOD, pp. 599–610 (2005)

  62. Sakurai, Y., Yoshikawa, M., Faloutsos, C.: Ftw: Fast Similarity Search under the Time Warping Distance. In: PODS, pp. 326–337. Maryland, Baltimore (2005)

  63. Sakurai, Y., Yoshikawa, M., Uemura, S., Kojima, H.: The A-Tree: an Index Structure for High-Dimensional Spaces Using Relative Approximation. In: VLDB, pp. 516–526 (2000)

  64. Shmueli, E., Kagian, A., Koren, Y., Lempel, R.: Care to Comment?: Recommendations for Commenting on News Stories. In: WWW, pp. 429–438 (2012)

  65. Sun, J., Tao, D., Faloutsos, C.: Beyond Streams and Graphs: Dynamic Tensor Analysis. In: KDD, pp. 374–383 (2006)

  66. Tao, Y., Faloutsos, C., Papadias, D., Liu, B.: Prediction and Indexing of Moving Objects with Unknown Motion Patterns. In: SIGMOD, pp. 611–622 (2004)

  67. Toyoda, M., Sakurai, Y., Ishikawa, Y.: Pattern discovery in data streams under the time warping distance. VLDB J 22(3), 295–318 (2013)

    Article  Google Scholar 

  68. Vlachos, M., Gunopulos, D., Kollios, G.: Discovering Similar Multidimensional Trajectories. In: ICDE, pp. 673–684 (2002)

  69. Wang, H., Yin, J., Pei, J., Yu, P.S., Yu, J.X.: Suppressing Model Overfitting in Mining Concept-Drifting Data Streams. In: KDD, pp. 736–741 (2006)

  70. Wang, P., Wang, H., Wang, W.: Finding Semantics in Time Series. In: SIGMOD, pp. 385–396. Conference (2011)

  71. Yang, J., McAuley, J.J., Leskovec, J., LePendu, P., Shah, N.: Finding Progression Stages in Time-Evolving Event Sequences. In: WWW, pp 783–794 (2014)

  72. Zafarani, R., Liu, H.: Connecting Users across Social Media Sites: a Behavioral-Modeling Approach. In: KDD, pp 41–49 (2013)

  73. Zhao, Y., Sundaresan, N., Shen, Z., Yu, P.S.: Anatomy of a Web-Scale Resale Market: a Data Mining Approach. In: WWW, pp 1533–1544 (2013)

Download references

Acknowledgments

The authors would like to thank Christina Cowan for her help with interpreting the patterns of apparel companies. This work was supported by JSPS KAKENHI Grant-in-Aid for Scientific Research Number 15H02705, 26730060, 26280112. This material is based upon work supported by the National Science Foundation under Grants No. CNS-1314632 and IIS-1408924; and by the Army Research Laboratory (ARL) under Cooperative Agreement Number W911NF-09-2-0053; and by a Google Focused Research Award. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, ARL, or other funding parties. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yasuko Matsubara.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Matsubara, Y., Sakurai, Y. & Faloutsos, C. Ecosystem on the Web: non-linear mining and forecasting of co-evolving online activities. World Wide Web 20, 439–465 (2017). https://doi.org/10.1007/s11280-016-0389-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-016-0389-x

Keywords

Navigation