Abstract
In this chapter, we discuss some issues concerning the computation of machine learning models for data analytics on the Internet-of-Everything. We model such computations as compositions of services that form a process whose main stages are acquisition, preparation, model training, and model-based inference. Then, we discuss randomiza-tion-as-a-service as a key technique for limiting undesired information disclosure during this process. We recall some fundamental results showing that randomization decreases the severity of disclosure, but at the same time has an adverse effect on data utility, in our case the data business value within the specific IoE application. We argue that non-interactive randomization at data acquisition time, while decreasing utility, can provide maximum flexibility and best accommodate provisions for compliance with regulations, ethics and cultural factors.
E. Damiani—This chapter was written while Ernesto Damiani was on leave from Dipartimento di Informatica, Università degli Studi di Milano, Italia.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Here, the term “pipeline” is used loosely to designate any computation involving all or some of these stages, regardless of their order.
- 2.
In this case, S could be obtained by sampling DS. For the sake of simplicity, we shall assume \(S=DS\) in the remainder of this Section. So, the “estimate” is in fact the real value.
- 3.
If \(S \subsetneq DS\), i.e. a sample of customers is considered for computing the average, we expect a sampling error of the order \(O(1/{\root \of {n})}\). The Laplace random noise we have introduced has standard deviation O(1 / n), which is lower than the sampling error.
- 4.
The interested reader is referred to Michael Nielsen’s excellent online book (http://neuralnetworksanddeeplearning.com/).
- 5.
Again, adding regularization terms implicitly transforms training \(F_w\) into training a different \(F'_{w'}\), hopefully less prone to local minima.
References
Blum, A., Dwork, C., McSherry, F., Nissim, K.: Practical privacy: the SuLQ framework. In: Proceedings of the Twenty-Fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. PODS 2005, pp. 128–138 (2005). https://doi.org/10.1145/1065167.1065184
Bosc, P., Damiani, E., Fugini, M.: Fuzzy service selection in a distributed object-oriented environment. IEEE Trans. Fuzzy Syst. 9(5), 682–698 (2001). https://doi.org/10.1109/91.963755
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Chen, R., Mohammed, N., Fung, B.C.M., Desai, B.C., Xiong, L.: Publishing set-valued data via differential privacy. PVLDB 4, 1087–1098 (2011)
Damiani, E.: Toward big data risk analysis. In: 2015 IEEE International Conference on Big Data, Big Data 2015, Santa Clara, CA, USA, 29 October–1 November 2015, pp. 1905–1909. IEEE (2015) https://doi.org/10.1109/BigData.2015.7363966
Damiani, E., Ardagna, C., Ceravolo, P., Scarabottolo, N.: Toward model-based big data-as-a-service: the TOREADOR approach. In: Kirikova, M., Nørvåg, K., Papadopoulos, G.A. (eds.) ADBIS 2017. LNCS, vol. 10509, pp. 3–9. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66917-5_1
Damiani, E., D’Antona, O.M., Regonati, F.: Whitney numbers of some geometric lattices. J. Comb. Theory, Ser. A 65(1), 11–25 (1994)
Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006). https://doi.org/10.1007/11787006_1
Dwork, C.: Differential privacy: a survey of results. In: Agrawal, M., Du, D., Duan, Z., Li, A. (eds.) TAMC 2008. LNCS, vol. 4978, pp. 1–19. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-79228-4_1
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006). https://doi.org/10.1007/11681878_14
Ermis, B., Cemgil, A.T.: Differentially private dropout. CoRR abs/1712.01665 (2017). http://arxiv.org/abs/1712.01665
Ghosh, A., Roughgarden, T., Sundararajan, M.: Universally utility-maximizing privacy mechanisms. In: Proceedings of the Forty-First Annual ACM Symposium on Theory of Computing. STOC 2009, pp. 351–360 (2009). https://doi.org/10.1145/1536414.1536464
Kifer, D., Machanavajjhala, A.: A rigorous and customizable framework for privacy. In: Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. PODS 2012, pp. 77–88 (2012). https://doi.org/10.1145/2213556.2213571
Loeb, D., Damiani, E., D’Antona, O.M.: Decompositions of b\({}_{ ext{n}}\) and pi\({}_{ ext{n}}\) using symmetric chains. J. Comb. Theory, Ser. A 65(1), 151–157 (1994)
Pawlak, Z.: Rough Sets: Theoretical Aspects of Reasoning About Data. Kluwer Academic Publishers, Norwell (1992)
Recht, B., Re, C.: Beneath the valley of the noncommutative arithmetic-geometric mean inequality: conjectures, case studies, and consequences. In: Proceedings of the Twenty-Fifth Annual Conference Learning Theory (2012)
Rokach, L., Maimon, O.: Top-down induction of decision trees classifiers - a survey. IEEE Trans. Syst., Man, Cybern., Part C (Appl. Rev.) 35(4), 476–487 (2005). https://doi.org/10.1109/TSMCC.2004.843247
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014). http://dl.acm.org/citation.cfm?id=2627435.2670313
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Royal Stat. Soc., Ser. B 58, 267–288 (1994)
Zhou, Q., Zhou, H., Li, T.: Cost-sensitive feature selection using random forest. Know.-Based Syst. 95(C), 1–11 (2016). https://doi.org/10.1016/j.knosys.2015.11.010
Acknowledgements
This work was supported by H2020 EU-funded project EVOTION (grant agreement n. H2020-727521).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Cimato, S., Damiani, E. (2018). Some Ideas on Privacy-Aware Data Analytics in the Internet-of-Everything. In: Samarati, P., Ray, I., Ray, I. (eds) From Database to Cyber Security. Lecture Notes in Computer Science(), vol 11170. Springer, Cham. https://doi.org/10.1007/978-3-030-04834-1_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-04834-1_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04833-4
Online ISBN: 978-3-030-04834-1
eBook Packages: Computer ScienceComputer Science (R0)