Skip to main content

Some Ideas on Privacy-Aware Data Analytics in the Internet-of-Everything

  • Chapter
  • First Online:
Book cover From Database to Cyber Security

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11170))

Abstract

In this chapter, we discuss some issues concerning the computation of machine learning models for data analytics on the Internet-of-Everything. We model such computations as compositions of services that form a process whose main stages are acquisition, preparation, model training, and model-based inference. Then, we discuss randomiza-tion-as-a-service as a key technique for limiting undesired information disclosure during this process. We recall some fundamental results showing that randomization decreases the severity of disclosure, but at the same time has an adverse effect on data utility, in our case the data business value within the specific IoE application. We argue that non-interactive randomization at data acquisition time, while decreasing utility, can provide maximum flexibility and best accommodate provisions for compliance with regulations, ethics and cultural factors.

E. Damiani—This chapter was written while Ernesto Damiani was on leave from Dipartimento di Informatica, Università degli Studi di Milano, Italia.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Here, the term “pipeline” is used loosely to designate any computation involving all or some of these stages, regardless of their order.

  2. 2.

    In this case, S could be obtained by sampling DS. For the sake of simplicity, we shall assume \(S=DS\) in the remainder of this Section. So, the “estimate” is in fact the real value.

  3. 3.

    If \(S \subsetneq DS\), i.e. a sample of customers is considered for computing the average, we expect a sampling error of the order \(O(1/{\root \of {n})}\). The Laplace random noise we have introduced has standard deviation O(1 / n), which is lower than the sampling error.

  4. 4.

    The interested reader is referred to Michael Nielsen’s excellent online book (http://neuralnetworksanddeeplearning.com/).

  5. 5.

    Again, adding regularization terms implicitly transforms training \(F_w\) into training a different \(F'_{w'}\), hopefully less prone to local minima.

References

  1. Blum, A., Dwork, C., McSherry, F., Nissim, K.: Practical privacy: the SuLQ framework. In: Proceedings of the Twenty-Fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. PODS 2005, pp. 128–138 (2005). https://doi.org/10.1145/1065167.1065184

  2. Bosc, P., Damiani, E., Fugini, M.: Fuzzy service selection in a distributed object-oriented environment. IEEE Trans. Fuzzy Syst. 9(5), 682–698 (2001). https://doi.org/10.1109/91.963755

    Article  Google Scholar 

  3. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  4. Chen, R., Mohammed, N., Fung, B.C.M., Desai, B.C., Xiong, L.: Publishing set-valued data via differential privacy. PVLDB 4, 1087–1098 (2011)

    Google Scholar 

  5. Damiani, E.: Toward big data risk analysis. In: 2015 IEEE International Conference on Big Data, Big Data 2015, Santa Clara, CA, USA, 29 October–1 November 2015, pp. 1905–1909. IEEE (2015) https://doi.org/10.1109/BigData.2015.7363966

  6. Damiani, E., Ardagna, C., Ceravolo, P., Scarabottolo, N.: Toward model-based big data-as-a-service: the TOREADOR approach. In: Kirikova, M., Nørvåg, K., Papadopoulos, G.A. (eds.) ADBIS 2017. LNCS, vol. 10509, pp. 3–9. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66917-5_1

    Chapter  Google Scholar 

  7. Damiani, E., D’Antona, O.M., Regonati, F.: Whitney numbers of some geometric lattices. J. Comb. Theory, Ser. A 65(1), 11–25 (1994)

    Article  MathSciNet  Google Scholar 

  8. Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006). https://doi.org/10.1007/11787006_1

    Chapter  Google Scholar 

  9. Dwork, C.: Differential privacy: a survey of results. In: Agrawal, M., Du, D., Duan, Z., Li, A. (eds.) TAMC 2008. LNCS, vol. 4978, pp. 1–19. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-79228-4_1

    Chapter  MATH  Google Scholar 

  10. Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006). https://doi.org/10.1007/11681878_14

    Chapter  Google Scholar 

  11. Ermis, B., Cemgil, A.T.: Differentially private dropout. CoRR abs/1712.01665 (2017). http://arxiv.org/abs/1712.01665

  12. Ghosh, A., Roughgarden, T., Sundararajan, M.: Universally utility-maximizing privacy mechanisms. In: Proceedings of the Forty-First Annual ACM Symposium on Theory of Computing. STOC 2009, pp. 351–360 (2009). https://doi.org/10.1145/1536414.1536464

  13. Kifer, D., Machanavajjhala, A.: A rigorous and customizable framework for privacy. In: Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. PODS 2012, pp. 77–88 (2012). https://doi.org/10.1145/2213556.2213571

  14. Loeb, D., Damiani, E., D’Antona, O.M.: Decompositions of b\({}_{ ext{n}}\) and pi\({}_{ ext{n}}\) using symmetric chains. J. Comb. Theory, Ser. A 65(1), 151–157 (1994)

    Article  Google Scholar 

  15. Pawlak, Z.: Rough Sets: Theoretical Aspects of Reasoning About Data. Kluwer Academic Publishers, Norwell (1992)

    MATH  Google Scholar 

  16. Recht, B., Re, C.: Beneath the valley of the noncommutative arithmetic-geometric mean inequality: conjectures, case studies, and consequences. In: Proceedings of the Twenty-Fifth Annual Conference Learning Theory (2012)

    Google Scholar 

  17. Rokach, L., Maimon, O.: Top-down induction of decision trees classifiers - a survey. IEEE Trans. Syst., Man, Cybern., Part C (Appl. Rev.) 35(4), 476–487 (2005). https://doi.org/10.1109/TSMCC.2004.843247

    Article  Google Scholar 

  18. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014). http://dl.acm.org/citation.cfm?id=2627435.2670313

    MathSciNet  MATH  Google Scholar 

  19. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Royal Stat. Soc., Ser. B 58, 267–288 (1994)

    MathSciNet  MATH  Google Scholar 

  20. Zhou, Q., Zhou, H., Li, T.: Cost-sensitive feature selection using random forest. Know.-Based Syst. 95(C), 1–11 (2016). https://doi.org/10.1016/j.knosys.2015.11.010

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by H2020 EU-funded project EVOTION (grant agreement n. H2020-727521).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ernesto Damiani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Cimato, S., Damiani, E. (2018). Some Ideas on Privacy-Aware Data Analytics in the Internet-of-Everything. In: Samarati, P., Ray, I., Ray, I. (eds) From Database to Cyber Security. Lecture Notes in Computer Science(), vol 11170. Springer, Cham. https://doi.org/10.1007/978-3-030-04834-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-04834-1_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-04833-4

  • Online ISBN: 978-3-030-04834-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics