Some Ideas on Privacy-Aware Data Analytics in the Internet-of-Everything

Cimato, Stelvio; Damiani, Ernesto

doi:10.1007/978-3-030-04834-1_6

Stelvio Cimato¹⁶ &
Ernesto Damiani¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11170))

1861 Accesses
2 Citations
1 Altmetric

Abstract

In this chapter, we discuss some issues concerning the computation of machine learning models for data analytics on the Internet-of-Everything. We model such computations as compositions of services that form a process whose main stages are acquisition, preparation, model training, and model-based inference. Then, we discuss randomiza-tion-as-a-service as a key technique for limiting undesired information disclosure during this process. We recall some fundamental results showing that randomization decreases the severity of disclosure, but at the same time has an adverse effect on data utility, in our case the data business value within the specific IoE application. We argue that non-interactive randomization at data acquisition time, while decreasing utility, can provide maximum flexibility and best accommodate provisions for compliance with regulations, ethics and cultural factors.

E. Damiani—This chapter was written while Ernesto Damiani was on leave from Dipartimento di Informatica, Università degli Studi di Milano, Italia.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Here, the term “pipeline” is used loosely to designate any computation involving all or some of these stages, regardless of their order.
2.
In this case, S could be obtained by sampling DS. For the sake of simplicity, we shall assume \(S=DS\) in the remainder of this Section. So, the “estimate” is in fact the real value.
3.
If \(S \subsetneq DS\), i.e. a sample of customers is considered for computing the average, we expect a sampling error of the order \(O(1/{\root \of {n})}\). The Laplace random noise we have introduced has standard deviation O(1 / n), which is lower than the sampling error.
4.
The interested reader is referred to Michael Nielsen’s excellent online book (http://neuralnetworksanddeeplearning.com/).
5.
Again, adding regularization terms implicitly transforms training \(F_w\) into training a different \(F'_{w'}\), hopefully less prone to local minima.

References

Blum, A., Dwork, C., McSherry, F., Nissim, K.: Practical privacy: the SuLQ framework. In: Proceedings of the Twenty-Fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. PODS 2005, pp. 128–138 (2005). https://doi.org/10.1145/1065167.1065184
Bosc, P., Damiani, E., Fugini, M.: Fuzzy service selection in a distributed object-oriented environment. IEEE Trans. Fuzzy Syst. 9(5), 682–698 (2001). https://doi.org/10.1109/91.963755
Article Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
Chen, R., Mohammed, N., Fung, B.C.M., Desai, B.C., Xiong, L.: Publishing set-valued data via differential privacy. PVLDB 4, 1087–1098 (2011)
Google Scholar
Damiani, E.: Toward big data risk analysis. In: 2015 IEEE International Conference on Big Data, Big Data 2015, Santa Clara, CA, USA, 29 October–1 November 2015, pp. 1905–1909. IEEE (2015) https://doi.org/10.1109/BigData.2015.7363966
Damiani, E., Ardagna, C., Ceravolo, P., Scarabottolo, N.: Toward model-based big data-as-a-service: the TOREADOR approach. In: Kirikova, M., Nørvåg, K., Papadopoulos, G.A. (eds.) ADBIS 2017. LNCS, vol. 10509, pp. 3–9. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66917-5_1
Chapter Google Scholar
Damiani, E., D’Antona, O.M., Regonati, F.: Whitney numbers of some geometric lattices. J. Comb. Theory, Ser. A 65(1), 11–25 (1994)
Article MathSciNet Google Scholar
Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006). https://doi.org/10.1007/11787006_1
Chapter Google Scholar
Dwork, C.: Differential privacy: a survey of results. In: Agrawal, M., Du, D., Duan, Z., Li, A. (eds.) TAMC 2008. LNCS, vol. 4978, pp. 1–19. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-79228-4_1
Chapter MATH Google Scholar
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006). https://doi.org/10.1007/11681878_14
Chapter Google Scholar
Ermis, B., Cemgil, A.T.: Differentially private dropout. CoRR abs/1712.01665 (2017). http://arxiv.org/abs/1712.01665
Ghosh, A., Roughgarden, T., Sundararajan, M.: Universally utility-maximizing privacy mechanisms. In: Proceedings of the Forty-First Annual ACM Symposium on Theory of Computing. STOC 2009, pp. 351–360 (2009). https://doi.org/10.1145/1536414.1536464
Kifer, D., Machanavajjhala, A.: A rigorous and customizable framework for privacy. In: Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. PODS 2012, pp. 77–88 (2012). https://doi.org/10.1145/2213556.2213571
Loeb, D., Damiani, E., D’Antona, O.M.: Decompositions of b\({}_{ ext{n}}\) and pi\({}_{ ext{n}}\) using symmetric chains. J. Comb. Theory, Ser. A 65(1), 151–157 (1994)
Article Google Scholar
Pawlak, Z.: Rough Sets: Theoretical Aspects of Reasoning About Data. Kluwer Academic Publishers, Norwell (1992)
MATH Google Scholar
Recht, B., Re, C.: Beneath the valley of the noncommutative arithmetic-geometric mean inequality: conjectures, case studies, and consequences. In: Proceedings of the Twenty-Fifth Annual Conference Learning Theory (2012)
Google Scholar
Rokach, L., Maimon, O.: Top-down induction of decision trees classifiers - a survey. IEEE Trans. Syst., Man, Cybern., Part C (Appl. Rev.) 35(4), 476–487 (2005). https://doi.org/10.1109/TSMCC.2004.843247
Article Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014). http://dl.acm.org/citation.cfm?id=2627435.2670313
MathSciNet MATH Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Royal Stat. Soc., Ser. B 58, 267–288 (1994)
MathSciNet MATH Google Scholar
Zhou, Q., Zhou, H., Li, T.: Cost-sensitive feature selection using random forest. Know.-Based Syst. 95(C), 1–11 (2016). https://doi.org/10.1016/j.knosys.2015.11.010
Article Google Scholar

Download references

Acknowledgements

This work was supported by H2020 EU-funded project EVOTION (grant agreement n. H2020-727521).

Author information

Authors and Affiliations

Dipartimento di Informatica, Università degli Studi di Milano, Milan, Italy
Stelvio Cimato
EBTIC - Khalifa University of Science and Technology, Abu Dhabi, UAE
Ernesto Damiani

Authors

Stelvio Cimato
View author publications
You can also search for this author in PubMed Google Scholar
Ernesto Damiani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ernesto Damiani .

Editor information

Editors and Affiliations

Università degli Studi di Milano, Milano, Italy
Pierangela Samarati
Colorado State University, Fort Collins, CO, USA
Indrajit Ray
Colorado State University, Fort Collins, CO, USA
Indrakshi Ray

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Cimato, S., Damiani, E. (2018). Some Ideas on Privacy-Aware Data Analytics in the Internet-of-Everything. In: Samarati, P., Ray, I., Ray, I. (eds) From Database to Cyber Security. Lecture Notes in Computer Science(), vol 11170. Springer, Cham. https://doi.org/10.1007/978-3-030-04834-1_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-04834-1_6
Published: 30 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04833-4
Online ISBN: 978-3-030-04834-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics