Abstract
In this paper, we explore the complexity of mapping filtering streaming applications on large-scale homogeneous and heterogeneous platforms, with a particular emphasis on communication models and their impact. Filtering applications are streaming applications where each node also has a selectivity which either increases or decreases the size of its input data set. This selectivity makes the problem of scheduling these applications more challenging than the more studied problem of scheduling “non-filtering” streaming workflows. We address the complexity of the following two problems:
-
Evaluation: Given a mapping of nodes to processors, how can one compute the period and latency?
-
Optimization: Given a filtering workflow, how can one compute the mapping and schedule that minimize the period or latency? A solution to this problem requires generating both the mapping and the associated operation list—the order in which each processor executes its assigned tasks.
We address this general problem in two steps. First, we address the simplified model without communication cost. In this case, the evaluation problems are easy, and the optimization problems have polynomial complexity on homogeneous platforms. However, we show that the optimization problems become NP-hard on heterogeneous platforms. Second, we consider platforms with communication costs. Clearly, due to the previous results, the optimization problems on heterogeneous platforms are still NP-hard. Therefore we come back to homogeneous platforms and extend the framework with three significant realistic communication models. Now even evaluation problems become difficult, because the mapping must now be enriched with an operation list that provides the time-steps at which each computation and each communication occurs in the system: determining the best operation list has a combinatorial nature. Not too surprisingly, optimization problems are NP-hard too. Altogether, this paper provides a comprehensive overview of the additional difficulties induced by heterogeneity and communication costs.
Similar content being viewed by others
References
Agnetis, A., Detti, P., Pranzo, M., Sodhi, M.S.: Sequencing unreliable jobs on parallel machines. J. Sched. 12(1), 45–54 (2008). Available on-line at http://www.springerlink.com/content/c571u1221560j432
Babu, S., Motwani, R., Munagala, K., Nishizawa, I., Widom, J.: Adaptive ordering of pipelined stream filters. In: SIGMOD’04: Proceedings of the 2004 ACM SIGMOD Int. Conf. on Management of Data, pp. 407–418. ACM, New York (2004)
Benoit, A., Robert, Y.: Mapping pipeline skeletons onto heterogeneous platforms. J. Parallel Distrib. Comput. 68(6), 790–808 (2008)
Benoit, A., Dufossé, F., Robert, Y.: Filter placement on a pipelined architecture. In: 11th Workshop on Advances in Parallel and Distributed Computational Models APDCM 2009. IEEE Computer Society, Los Alamitos (2009)
Bhat, P., Raghavendra, C., Prasanna, V.: Efficient collective communication in distributed heterogeneous systems. J. Parallel Distrib. Comput. 63, 251–263 (2003)
Burge, J., Munagala, K., Srivastava, U.: Ordering pipelined query operators with precedence constraints. Research Report 2005-40, Stanford University, November 2005
Chaudhuri, S., Shim, K.: Optimization of queries with user-defined predicates. ACM Trans. Database Syst. 24(2), 177–228 (1999)
DataCutter Project: Middleware for Filtering Large Archival Scientific Datasets in a Grid Environment. http://www.cs.umd.edu/projects/hpsl/ResearchAreas/DataCutter.htm
Florescu, D., Grunhagen, A., Kossmann, D.: Xl: A platform for web services. In: CIDR 2003, First Biennial Conference on Innovative Data Systems Research, 2003. On-line proceedings at http://www-db.cs.wisc.edu/cidr/program/p8.pdf
Garey, M.R., Johnson, D.S.: Computers and Intractability, a Guide to the Theory of NP-Completeness. Freeman, New York (1979)
Hellerstein, J.M.: Predicate migration: optimizing queries with expensive predicates. In: Proceedings of the ACM SIGMOD Conference on Management of Data, pp. 267–276 (1993)
Hong, B., Prasanna, V.: Bandwidth-aware resource allocation for heterogeneous computing systems to maximize throughput. In: Proceedings of the 32th International Conference on Parallel Processing, ICPP’2003. IEEE Computer Society, Los Alamitos (2003)
Ouzzani, M., Bouguettaya, A.: Query processing and optimization on the web. Distrib. Parallel Databases 15(3), 187–218 (2004)
Snir, M., Otto, S.W., Huss-Lederman, S., Walker, D.W., Dongarra, J.: MPI the Complete Reference. MIT Press, Cambridge (1996)
Srivastava, U., Munagala, K., Widom, J., Motwani, R.: Query optimization over web services. In: VLDB ’06: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 355–366. VLDB Endowment (2006)
Taura, K., Chien, A.A.: A heuristic algorithm for mapping communicating tasks on heterogeneous resources. In: Heterogeneous Computing Workshop, pp. 102–115. IEEE Computer Society, Los Alamitos (2000)
Vydyanathan, N., Catalyurek, U., Kurc, T., Saddayappan, P., Saltz, J.: Toward optimizing latency under throughput constraints for application workflows on clusters. In: Euro-Par’07. LNCS, vol. 4641, pp. 173–183. Springer, Berlin (2007)
Vydyanathan, N., Catalyurek, U., Kurc, T., Saddayappan, P., Saltz, J.: A duplication based algorithm for optimizing latency under throughput constraints for streaming workflows. In: ICPP’2008, the International Conference on Parallel Processing, pp. 254–261. IEEE Computer Society, Los Alamitos (2008)
Wu, Q., Gu, Y.: Supporting distributed application workflows in heterogeneous computing environments. In: 14th International Conference on Parallel and Distributed Systems, ICPADS. IEEE Computer Society, Los Alamitos (2008)
Wu, Q., Gao, J., Zhu, M., Rao, N., Huang, J., Iyengar, S.: On optimal resource utilization for distributed remote visualization. IEEE Trans. Comput. 57(1), 55–68 (2008)
Yu, W.: The two-machine flow shop problem with delays and the one-machine total tardiness problem. PhD Thesis, Technishe Universiteit Eidhoven, June 1996
Yu, W., Hoogeveen, H., Lenstra, J.K.: Minimizing makespan in a two-machine flow shop with delays and unit-time operations is NP-hard. J. Sched. 7(5), 333–348 (2004)
Author information
Authors and Affiliations
Corresponding author
Additional information
Part of this paper appeared in IPDPS’09 and SPAA’09.
Rights and permissions
About this article
Cite this article
Agrawal, K., Benoit, A., Dufossé, F. et al. Mapping Filtering Streaming Applications. Algorithmica 62, 258–308 (2012). https://doi.org/10.1007/s00453-010-9453-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00453-010-9453-6