Skip to main content

Theta Architecture: Preserving the Quality of Analytics in Data-Driven Systems

  • Conference paper
  • First Online:
New Trends in Databases and Information Systems (ADBIS 2017)

Abstract

With the recent advances in Big Data storage and processing, there is a real potential of data-driven software systems, i.e., systems that employ analysis of large amounts of data to inform their runtime decisions. However, for these decisions to be trustworthy and dependable, one needs to deal with the well-known challenges on the data analysis domain: data scarcity, low-quality of data available for analysis, low veracity of data and subsequent analysis results, data privacy constraints that hinder the analysis. A promising solution is to introduce flexibility in the data analytics part of the system enabling optimization at runtime of the algorithms and data streams based on the combination of veracity, privacy and scarcity in order to preserve the target level of quality of the data-driven decisions. In this paper, we investigate this solution by providing an adaptive reference architecture and illustrate its applicability with an example from the traffic management domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.w3.org/TR/vocab-dqv/.

References

  1. Apache Hadoop (2017). http://hadoop.apache.org/

  2. Abedjan, Z., Golab, L., Naumann, F.: Data profiling: a tutorial. In: Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD 2017, pp. 1747–1751 (2017)

    Google Scholar 

  3. Carey, P.W., Mehler, J., Bever, T.G.: Judging the veracity of ambiguous sentences. J. Verbal Learn. Verb. Behav. 9(2), 243–254 (1970)

    Article  Google Scholar 

  4. Cheng, S.W., Garlan, D., Schmerl, B.: Stitch: a language for architecture-based self-adaptation. J. Syst. Softw. 85(12), 1–38 (2012)

    Article  Google Scholar 

  5. Dong, X.L., Gabrilovich, E., Murphy, K., Dang, V., Horn, W., Lugaresi, C., Sun, S., Zhang, W.: Knowledge-based trust: estimating the trustworthiness of web sources. Proc. VLDB Endow. 8(9), 938–949 (2015)

    Article  Google Scholar 

  6. Dong, X.L., Saha, B., Srivastava, D.: Less is more: selecting sources wisely for integration. In: Proceedings of the 39th International Conference on Very Large Data Bases, PVLDB 2013, pp. 37–48. VLDB Endowment (2013)

    Google Scholar 

  7. Dustdar, S., Pichler, R., Savenkov, V., Truong, H.L.: Quality-aware service-oriented data integration: requirements, state of the art and open challenges. SIGMOD Rec. 41(1), 11–19 (2012)

    Article  Google Scholar 

  8. Filieri, A., et al.: Software engineering meets control theory. In: Proceedings of SEAMS 2015, pp. 71–82. IEEE, May 2015

    Google Scholar 

  9. Florescu, D., Koller, D., Levy, A.Y.: Using probabilistic information in data integration. In: Proceedings of the 23rd International Conference on Very Large Data Bases, VLDB 1997, Athens, Greece, pp. 216–225, 25–29 August 1997

    Google Scholar 

  10. Garlan, D., Cheng, S.W., Huang, A.C., Schmerl, B., Steenkiste, P.: Rainbow: architecture-based self-adaptation with reusable infrastructure. Computer 37(10), 46–54 (2004)

    Article  Google Scholar 

  11. Geistefeldt, J.: Operational experience with temporary hard shoulder running in Germany. Transp. Res. Rec. J. Transp. Res. Board 2278(6), 67–73 (2012)

    Article  Google Scholar 

  12. Ghezzi, C., Pinto, L.S., Spoletini, P., Tamburrelli, G.: Managing non-functional uncertainty via model-driven adaptivity. In: Proceedings of ICSE 2013, pp. 33–42. IEEE (2013)

    Google Scholar 

  13. Gladbach, B.: Bundesanstalt fr Straenwesen: Merkblatt fr die Ausstattung von Verkehrsrechnerzentralen und Unterzentralen (MARZ). Technical report, Ausgabe 1999 (1999)

    Google Scholar 

  14. Kephart, J., Chess, D.: The vision of autonomic computing. Computer 36(1), 41–50 (2003)

    Article  MathSciNet  Google Scholar 

  15. Kreps, J., Narkhede, N., Rao, J., et al: Kafka: a distributed messaging system for log processing. In: Proceedings of the 6th International Workshop on Networking Meets Databases (NetDB 2011), pp. 1–7 (2011)

    Google Scholar 

  16. Krotofil, M., Larsen, J., Gollmann, D.: The process matters. In: Proceedings of the 10th ACM Symposium on Information Computer and Communications Security. Association for Computing Machinery (ACM) (2015)

    Google Scholar 

  17. Levine, T.R., Park, H.S., McCornack, S.A.: Accuracy in detecting truths and lies: documenting the “veracity effect”. Commun. Monogr. 66(2), 125–144 (1999)

    Article  Google Scholar 

  18. Li, Q., Li, Y., Gao, J., Zhao, B., Fan, W., Han, J.: Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 1187–1198. ACM (2014)

    Google Scholar 

  19. Lukoianova, T., Rubin, V.L.: Veracity roadmap: is Big Data objective, truthful and credible? (2014)

    Google Scholar 

  20. Mann, S., Vrij, A.: Police officers’ judgements of veracity tenseness, cognitive load and attempted behavioural control in real-life police interviews. Psychol. Crime Law 12(3), 307–319 (2006)

    Article  Google Scholar 

  21. Marr, B.: Big Data: the 5 vs. everyone must know. https://www.linkedin.com/pulse/20140306073407-64875646-big-data-the-5-vs-everyone-must-know

  22. Marz, N., Warren, J.: Big Data: Principles and Best Practices of Scalable Realtime Data Systems, 1st edn. Manning Publications Co., Greenwich (2015)

    Google Scholar 

  23. Menzies, T., Zimmermann, T.: Software analytics: so what? IEEE Softw. 30(4), 31–37 (2013)

    Article  Google Scholar 

  24. Mihaila, G.A., Raschid, L., Vidal, M.: Using quality of data metadata for source selection and ranking. In: Proceedings of the Third International Workshop on the Web and Databases, pp. 93–98 (2000)

    Google Scholar 

  25. Naumann, F., Freytag, J.C., Spiliopoulou, M.: Quality driven source selection using data envelope analysis. In: Third Conference on Information Quality (IQ 1998), pp. 137–152 (1998)

    Google Scholar 

  26. Pautasso, C., Zimmermann, O., Leymann, F.: Restful web services vs. “Big” web services: making the right architectural decision. In: Proceedings of the 17th International Conference on World Wide Web, WWW 2008, pp. 805–814. ACM, New York (2008)

    Google Scholar 

  27. Quix, C., Hai, R., Vatov, I.: Metadata extraction and management in data lakes with GEMMS. CSIMQ 9, 67–83 (2016)

    Article  Google Scholar 

  28. Salehie, M., Tahvildari, L.: Self-adaptive software: landscape and research challenges. ACM Trans. Auton. Adapti. Syst. 4(2), 1–40 (2009)

    Article  Google Scholar 

  29. Schmid, S., Gerostathopoulos, I., Prehofer, C., Bures, T.: Self-adaptation based on big data analytics: a model problem and tool. In: Proceedings of the 12th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS 2017), pp. 102–108. IEEE Press, Piscataway (2017). https://doi.org/10.1109/SEAMS.2017.20

  30. Srinivasa, S., Bhatnagar, V. (eds.): BDA 2012. LNCS, vol. 7678. Springer, Heidelberg (2012)

    Google Scholar 

  31. Staron, M., Scandariato, R.: Data veracity in intelligent transportation systems: the slippery road warning scenario. In: 2016 IEEE Intelligent Vehicles Symposium (IV), pp. 821–826. IEEE (2016)

    Google Scholar 

  32. Zhang, Y., Wang, H., Gao, H., Li, J.: Efficient accuracy evaluation for multi-modal sensed data. J. Comb. Optim. 32(4), 1068–1088 (2016)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vasileios Theodorou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Theodorou, V., Gerostathopoulos, I., Amini, S., Scandariato, R., Prehofer, C., Staron, M. (2017). Theta Architecture: Preserving the Quality of Analytics in Data-Driven Systems. In: Kirikova, M., et al. New Trends in Databases and Information Systems. ADBIS 2017. Communications in Computer and Information Science, vol 767. Springer, Cham. https://doi.org/10.1007/978-3-319-67162-8_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67162-8_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67161-1

  • Online ISBN: 978-3-319-67162-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics