Abstract
With the recent advances in Big Data storage and processing, there is a real potential of data-driven software systems, i.e., systems that employ analysis of large amounts of data to inform their runtime decisions. However, for these decisions to be trustworthy and dependable, one needs to deal with the well-known challenges on the data analysis domain: data scarcity, low-quality of data available for analysis, low veracity of data and subsequent analysis results, data privacy constraints that hinder the analysis. A promising solution is to introduce flexibility in the data analytics part of the system enabling optimization at runtime of the algorithms and data streams based on the combination of veracity, privacy and scarcity in order to preserve the target level of quality of the data-driven decisions. In this paper, we investigate this solution by providing an adaptive reference architecture and illustrate its applicability with an example from the traffic management domain.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Apache Hadoop (2017). http://hadoop.apache.org/
Abedjan, Z., Golab, L., Naumann, F.: Data profiling: a tutorial. In: Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD 2017, pp. 1747–1751 (2017)
Carey, P.W., Mehler, J., Bever, T.G.: Judging the veracity of ambiguous sentences. J. Verbal Learn. Verb. Behav. 9(2), 243–254 (1970)
Cheng, S.W., Garlan, D., Schmerl, B.: Stitch: a language for architecture-based self-adaptation. J. Syst. Softw. 85(12), 1–38 (2012)
Dong, X.L., Gabrilovich, E., Murphy, K., Dang, V., Horn, W., Lugaresi, C., Sun, S., Zhang, W.: Knowledge-based trust: estimating the trustworthiness of web sources. Proc. VLDB Endow. 8(9), 938–949 (2015)
Dong, X.L., Saha, B., Srivastava, D.: Less is more: selecting sources wisely for integration. In: Proceedings of the 39th International Conference on Very Large Data Bases, PVLDB 2013, pp. 37–48. VLDB Endowment (2013)
Dustdar, S., Pichler, R., Savenkov, V., Truong, H.L.: Quality-aware service-oriented data integration: requirements, state of the art and open challenges. SIGMOD Rec. 41(1), 11–19 (2012)
Filieri, A., et al.: Software engineering meets control theory. In: Proceedings of SEAMS 2015, pp. 71–82. IEEE, May 2015
Florescu, D., Koller, D., Levy, A.Y.: Using probabilistic information in data integration. In: Proceedings of the 23rd International Conference on Very Large Data Bases, VLDB 1997, Athens, Greece, pp. 216–225, 25–29 August 1997
Garlan, D., Cheng, S.W., Huang, A.C., Schmerl, B., Steenkiste, P.: Rainbow: architecture-based self-adaptation with reusable infrastructure. Computer 37(10), 46–54 (2004)
Geistefeldt, J.: Operational experience with temporary hard shoulder running in Germany. Transp. Res. Rec. J. Transp. Res. Board 2278(6), 67–73 (2012)
Ghezzi, C., Pinto, L.S., Spoletini, P., Tamburrelli, G.: Managing non-functional uncertainty via model-driven adaptivity. In: Proceedings of ICSE 2013, pp. 33–42. IEEE (2013)
Gladbach, B.: Bundesanstalt fr Straenwesen: Merkblatt fr die Ausstattung von Verkehrsrechnerzentralen und Unterzentralen (MARZ). Technical report, Ausgabe 1999 (1999)
Kephart, J., Chess, D.: The vision of autonomic computing. Computer 36(1), 41–50 (2003)
Kreps, J., Narkhede, N., Rao, J., et al: Kafka: a distributed messaging system for log processing. In: Proceedings of the 6th International Workshop on Networking Meets Databases (NetDB 2011), pp. 1–7 (2011)
Krotofil, M., Larsen, J., Gollmann, D.: The process matters. In: Proceedings of the 10th ACM Symposium on Information Computer and Communications Security. Association for Computing Machinery (ACM) (2015)
Levine, T.R., Park, H.S., McCornack, S.A.: Accuracy in detecting truths and lies: documenting the “veracity effect”. Commun. Monogr. 66(2), 125–144 (1999)
Li, Q., Li, Y., Gao, J., Zhao, B., Fan, W., Han, J.: Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 1187–1198. ACM (2014)
Lukoianova, T., Rubin, V.L.: Veracity roadmap: is Big Data objective, truthful and credible? (2014)
Mann, S., Vrij, A.: Police officers’ judgements of veracity tenseness, cognitive load and attempted behavioural control in real-life police interviews. Psychol. Crime Law 12(3), 307–319 (2006)
Marr, B.: Big Data: the 5 vs. everyone must know. https://www.linkedin.com/pulse/20140306073407-64875646-big-data-the-5-vs-everyone-must-know
Marz, N., Warren, J.: Big Data: Principles and Best Practices of Scalable Realtime Data Systems, 1st edn. Manning Publications Co., Greenwich (2015)
Menzies, T., Zimmermann, T.: Software analytics: so what? IEEE Softw. 30(4), 31–37 (2013)
Mihaila, G.A., Raschid, L., Vidal, M.: Using quality of data metadata for source selection and ranking. In: Proceedings of the Third International Workshop on the Web and Databases, pp. 93–98 (2000)
Naumann, F., Freytag, J.C., Spiliopoulou, M.: Quality driven source selection using data envelope analysis. In: Third Conference on Information Quality (IQ 1998), pp. 137–152 (1998)
Pautasso, C., Zimmermann, O., Leymann, F.: Restful web services vs. “Big” web services: making the right architectural decision. In: Proceedings of the 17th International Conference on World Wide Web, WWW 2008, pp. 805–814. ACM, New York (2008)
Quix, C., Hai, R., Vatov, I.: Metadata extraction and management in data lakes with GEMMS. CSIMQ 9, 67–83 (2016)
Salehie, M., Tahvildari, L.: Self-adaptive software: landscape and research challenges. ACM Trans. Auton. Adapti. Syst. 4(2), 1–40 (2009)
Schmid, S., Gerostathopoulos, I., Prehofer, C., Bures, T.: Self-adaptation based on big data analytics: a model problem and tool. In: Proceedings of the 12th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS 2017), pp. 102–108. IEEE Press, Piscataway (2017). https://doi.org/10.1109/SEAMS.2017.20
Srinivasa, S., Bhatnagar, V. (eds.): BDA 2012. LNCS, vol. 7678. Springer, Heidelberg (2012)
Staron, M., Scandariato, R.: Data veracity in intelligent transportation systems: the slippery road warning scenario. In: 2016 IEEE Intelligent Vehicles Symposium (IV), pp. 821–826. IEEE (2016)
Zhang, Y., Wang, H., Gao, H., Li, J.: Efficient accuracy evaluation for multi-modal sensed data. J. Comb. Optim. 32(4), 1068–1088 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Theodorou, V., Gerostathopoulos, I., Amini, S., Scandariato, R., Prehofer, C., Staron, M. (2017). Theta Architecture: Preserving the Quality of Analytics in Data-Driven Systems. In: Kirikova, M., et al. New Trends in Databases and Information Systems. ADBIS 2017. Communications in Computer and Information Science, vol 767. Springer, Cham. https://doi.org/10.1007/978-3-319-67162-8_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-67162-8_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67161-1
Online ISBN: 978-3-319-67162-8
eBook Packages: Computer ScienceComputer Science (R0)