Skip to main content
Log in

Shared State for Distributed Interactive Data Mining Applications

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

Distributed data mining applications involving user interaction are now feasible due to advances in processor speed and network bandwidth. These applications are traditionally implemented using ad-hoc communication protocols, which are often either cumbersome or inefficient. This paper presents and evaluates a system for sharing state among such interactive distributed data mining applications, developed with the goal of providing both ease of programming and efficiency. Our system, called InterAct, supports data sharing efficiently by allowing caching, by communicating only the modified data, and by allowing relaxed coherence requirement specification for reduced communication overhead, as well as placement of data for improved locality, on a per client and per data structure basis. Additionally, our system supports the ability to supply clients with consistent copies of shared data even while the data is being modified.

We evaluate the performance of the system on a set of data mining applications that perform queries on data structures that summarize information from the databases of interest. We demonstrate that providing a runtime system such as InterAct results in a 10–30 fold improvement in execution time due to shared data caching, the applications' ability to tolerate stale data (client-controlled coherence), and the ability to off-load some of the computation from the server to the client. Performance is improved without requiring complex communication protocols to be built into the application, since the runtime system uses knowledge about application behavior (encoded by specifying coherence requirements) in order to automatically optimize the resources utilized for communication. We also demonstrate that for our benchmark tests, the quality of the results generated is not significantly deteriorated due to the use of more relaxed coherence protocols.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. C. Aggarwal and P. Yu, “Online generation of association rules,” in IEEE International Conference on Data Engineering, Feb. 1998.

  2. R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. Inkeri Verkamo, “Fast discovery of association rules,” in Advances in Knowledge Discovery and Data Mining, U. Fayyad et al. (Eds.), MIT Press: Cambridge, MA, 1996.

    Google Scholar 

  3. S. Ahuja, N. Carreiro, D. Gelernter, and V. Krishnaswamy, “Matching language and hardware for parallel computation in the Linda machine,” IEEE Transactions on Computers, vol. 37, no.8, pp. 896–908, 1988.

    Google Scholar 

  4. R. Alonso, D. Barbara, and H. Garcia-Molina, “Data caching issues in an information retrieval system,” ACM TODS, vol. 15, no.3, pp. 359–384, 1990.

    Google Scholar 

  5. C. Amza, A.L. Cox, S. Dwarkadas, P. Keleher, H. Lu, R. Rajamony, and W. Zwaenepoel, “TreadMarks: Shared memory computing on networks of workstations,” IEEE Computer, vol. 29, no.2, pp. 18–28, 1996.

    Google Scholar 

  6. H.E. Bal, M.F. Kaashoek, and A.S. Tanenbaum, “Orca: A language for parallel programming of distributed systems,” IEEE Transactions on Software Engineering, pp. 190–205, June 1992.

  7. M. Carey, D. DeWitt, J. Naughton, M. Solomon et al., “Shoring Up Persistent Applications,” in Proc. of the 1994 ACM SIGMOD Conference, 1994.

  8. J.B. Carter, J.K. Bennett, and W. Zwaenepoel, “Implementation and performance of Munin,” in Proceedings of the 13th ACM Symposium on Operating Systems Principles, Oct. 1991, pp. 152–164.

  9. D. Chen, S. Dwarkadas, S. Parthasarathy, E. Pinheiro, and M.L. Scott, “InterWeave: A middleware system for distributed shared state,” in Fifth Workshop on Languages, Compilers, and Runtime Systems (LCR) 2000, Rochester, NY, May 2000.

  10. D. Chen, C. Tang, X. Chen, S. Dwarkadas, and M. Scott, “Beyond S-DSM: Shared state for distributed systems,” URCS Technical Report 744, University of Rochester, March 2001.

  11. D.E. Culler and J.P. Singh, Parallel Computer Architecture-A Hardware/Software Approach, Morgan Kaufmann: San Mateo, CA, 1999.

    Google Scholar 

  12. G. Das, H. Mannila, and P. Ronkainen, “Similarity of attributes by external probes,” in Proceedings of the 4th Symposium on Knowledge Discovery and Data-Mining, 1998.

  13. L. Devroye, A Course in Density Estimation, Birkhauser: Boston, MA, 1987.

    Google Scholar 

  14. P. Dewan and J. Riedl, “Towards computer-supported concurrent software engineering,” IEEE Computer, vol. 26, no.1, 1993.

  15. A. Dingle and T. Partl, “Web cache coherence,” in Proceedings of 5th WWW Conference ( journal version: IJCN), 1997.

  16. G. Fitzpatrick, S. Kaplan, and W. Tollone, “Work, locales and distributed social worlds,” in European Conference on Computer Supported Collaborative Work, 1995.

  17. M.J. Franklin, Client Data Caching: A Foundation for High Performance Object Database Systems, Kluwer Academic Publishers: Dordrecht, 1996.

    Google Scholar 

  18. K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy, “Memory consistency and event ordering in scalable shared-memory multiprocessors,” in Proceedings of the 17th Annual International Symposium on Computer Architecture, May 1990, pp. 15–26.

  19. R. Grossman, S. Bailey, S. Kasif, D. Mon, A. Ramu, and B. Malhi, “Design of Papyrus: A system for high performance, distributed data mining over clusters, meta-clusters and super-clusters,” in Proceedings of Workshop on Distributed Data Mining, alongwith KDD98, Aug. 1998.

  20. D. Grunwald, B. Zorn, and R. Henderson, “Improving the cache locality of memory allocation,” ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 1993, pp. 177–186.

  21. Y. Guo, S. Rueger, J. Sutiwaraphun, and J. Forbes-Millot, “Meta-learning for parallel data mining,” in Proceedings of the Seventh Parallel Computing Workshop, 1997.

  22. Y. Huang, R. Sloan, and O. Wolfson, “Divergence caching in client-server architectures,” IEEE Conf. on Parallel and Distributed Information Systems, 1994.

  23. L. Iftode, C. Dubnicki, E.W. Felten, and K. Li, “Improving release-consistent shared virtual memory using automatic update,” High Performance Computer Architecture, pp. 14–25, Feb. 1996.

  24. K.L. Johnson, M.F. Kaashoek, and D.A. Wallach, “CRL: High-performance all-software distributed shared memory,” in Proceedings of the 15th ACM Symposium on Operating Systems Principles, Dec. 1995, pp. 213–228.

  25. A.D. Joseph, A.F. deLespinasse, J.A. Tauber, D.K. Gifford, and M.F. Kaashoek, “Rover: A toolkit for mobile information access,” in 15th SOSP, Dec. 1995.

  26. E. Jul, H. Levy, N. Hutchinson, and A. Black, “Fine-grained mobility in the Emerald system,” ACM Transactions on Computer Systems, vol. 6, no.1, pp. 109–133, 1988.

    Google Scholar 

  27. H. Kargupta, B. Park, D. Hershberger, and E. Johnson, “Collective data mining: A new perspective toward distributed data analysis,” in Advances in Distributed and Parallel Knowledge Discovery, Kargupta and Chan (Eds.), 2000.

  28. M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A.I. Verkamo, “Finding interesting rules from large sets of discovered association rules,” in 3rd Intl. Conf. Information and Knowledge Management, Nov. 1994, pp. 401–407.

  29. L. Kontothanasis, G. Hunt, R. Stets, N. Hardavellas, M. Cierniak, S. Parthasarathy, W. Meira, S. Dwarkadas, and M. Scott, “VM-based shared memory on low-latency, remote-memory-access networks,” in PROC of the 24TH ISCA, June 1997.

  30. B. Liskov, A. Adya, M. Castro, M. Day, S. Ghemawat, R. Gruber, U. Maheshwari, A. Meyers, and L. Shrira, “Safe and efficient sharing of persistent objects in Thor,” in SIGMOD, 1996.

  31. C. Mercer, S. Savage, and H. Tokuda, “Processor capacity reserves: Operating system support for multimedia applications,” in Proceedings of the IEEE International Conference on Multimedia Computing and Systems, 1994.

  32. K. Nahrstedt, H. Chu, and S. Narayan, “QoS-aware resource manangement for distributed multimedia applications,” Journal on High-Speed Networking, vol. 8, no.3/4, pp. 227–255, 1998. IOS Press.

    Google Scholar 

  33. S. Parthasarathy, “Active data mining in a distributed setting,” PhD Dissertation, University of Rochester, 1999.

  34. S. Parthasarathy, “Towards network-aware data mining,” in InternationalWorkshop on Parallel and Distributed Data Mining, along with IPDPS 2001.

  35. S. Parthasarathy and S. Dwarkadas, “InterAct: Virtual sharing for interactive client-server applications,” in Fourth Workshop on Languages, Compilers, and Runtime Systems (LCR), May 1998.

  36. S. Parthasarathy and S. Dwarkadas, “Shared state for client server mining,” in SIAM International Conference on Data Mining, 2001.

  37. S. Parthasarathy, S. Dwarkadas, and M. Ogihara, “Active mining in a distributed setting,” in Workshop on Parallel and Distributed KDD Systems, 1999.

  38. S. Parthasarathy and M. Ogihara, “Clustering homogeneous distributed datasets,” Fourth Practical Applications of Knowledge Discovery and Data Mining (PKDD), 2000.

  39. S. Parthasarathy, R. Subramonian, and R. Venkata, “Generalized discretization for summarization and classification,” in PADD, Jan. 1998.

  40. S. Parthasarathy, M. Zaki, and W. Li, “Memory placement techniques for parallel association mining,” in Proceedings of the 4th Symposium on Knowledge Discovery and Data-Mining, 1998.

  41. S. Parthasarathy, M. Zaki, M. Ogihara, and S. Dwarkadas, “Incremental and interactive sequence mining,” ACM Conference on Information and Knowledge Management, 1999.

  42. E. Pinheiro, D. Chen, S. Dwarkadas, S. Parthasarathy, and M.L. Scott, “S-DSM for heterogeneous machine architectures,” in Second Workshop on Software Distributed Shared Memory, Santa Fe, NM, May 2000.

  43. M. Satyanarayanan and D. Narayanan, “Multi-fidelity algorithms for interactive mobile applications,” in Third International Workshop on Discrete Algorithms and Methods in Mobile Computing and Communications, Seattle, WA, Aug. 1999.

  44. I. Schoinas, B. Falsafi, A.R. Lebeck, S.K. Reinhardt, J.R. Larus, and D.A Wood, “Fine-grain access control for distributed shared memory,” in Proceedings of the 6th Symposium on Architectural Support for Programming Languages and Operating Systems, Oct. 1994, pp. 297–306.

  45. M. Shapiro, S. Kloosterman, and F. Riccardi, “PerDiS-A persistent distributed store for cooperative applications,” in Proceedings of the Third Cabernet Plenary Workshop, Rennes, France, April 1997.

  46. A. Singla, U. Ramachandran, and J. Hodgins, “Temporal notions of synchronization and consistency in beehive,” in PROC of the 9TH SPAA, June 1997.

  47. R. Srinivasan, C. Liang, and K. Ramamritham, “Maintaining temporal coherency of virtual data warehouses,” in IEEE Real-Time Systems Symposium (RTSS98), Dec. 1998.

  48. D. Sriram, R. Logcher, N. Groleau, and J. Chernoff, “Dice: An object oriented programming environment for cooperative engineering design,” AI in Enginnering Design, vol. 3, Academic Press: 1992.

  49. R. Stets, S. Dwarkadas, N. Hardavellas, G. Hunt, L. Kontothanassis, S. Parthasarathy, and M. Scott, “Cashmere-2L: Software coherent shared memory on a clustered remote-write network,” in Symposium on Operating Systems Principles, Oct. 1997.

  50. R. Subramonian and S. Parthasarathy, “A framework for distributed data mining,” in Proceedings of Workshop on Distributed Data Mining, alongwith KDD98, Aug. 1998.

  51. D.B. Terry, M.M. Theimer, K. Peterson, A.J. Demers, M.J. Spreitzer, and C.H. Hauser, “Managing update conflicts in bayou, a weakly connected replicated storage system,” in 15th SOSP, Dec. 1995.

  52. S. Thomas, S. Bodagala, K. Alsabti, and S. Ranka, “Incremental updation of association rules,” in KDD97, Aug. 1997.

  53. M. vanSteen, P. Homburg, and A.S. Tanenbaum, “The architectural design of Globe: A wide-area distributed system,” in Technical Report (Vrije University) IR-431, March 1997.

  54. P.R. Wilson, “Pointer swizzling at page fault time: Efficiently and compatibly supporting huge address spaces on standard hardware,” in International Workshop on Object Orientation in Operating Systems, Sept. 1992.

  55. M.J. Zekauskas, W.A. Sawdon, and B.N. Bershad, “Software write detection for distributed shared memory,” in Proceedings of the First USENIX Symposium on Operating System Design and Implementation, Nov. 1994, pp. 87–100.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Parthasarathy, S., Dwarkadas, S. Shared State for Distributed Interactive Data Mining Applications. Distributed and Parallel Databases 11, 129–155 (2002). https://doi.org/10.1023/A:1013936118506

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1013936118506

Keywords

Navigation