Skip to main content

Analysis of Privacy Preserving Data Publishing Techniques for Various Feature Selection Stability Measures

  • Conference paper
  • First Online:
Computational Intelligence, Communications, and Business Analytics (CICBA 2017)

Abstract

Data mining is the mining of formerly not known and valid information from the archived data of organizations. The datasets are mostly high dimensional which will make the data mining process difficult. Feature selection is the dimensionality reduction technique in data mining. Selection stability is the robustness of the feature selection algorithms for small perturbation of the dataset i.e., to select the same or similar subset of features in each subsequent iterations. Selection stability is mostly depending on the characteristics of the dataset. Privacy preserving data publishing techniques modify the dataset for preserving the privacy of the individuals and this perturbation will affect the selection stability. There will be correlation between the perturbations of the dataset for privacy preservation, feature selection stability and accuracy of the data mining results i.e., data utility. There will be various selection stability metrics to measure the selection stability. This paper analyses the privacy preserving data publishing techniques for these various feature selection stability measures on behalf of privacy preservation, selection stability and data utility.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Jain, A.K., Chandrasekaran, B.: Dimensionality and sample size considerations in pattern recognition practice. In: Krishnaiah, P.R., Kanal, L.N. (eds.) Handbook of Statistics, pp. 835–855. North-Holland Publishing Company, Amsterdam (1982)

    Google Scholar 

  2. Alelyani, S., Liu, H.: The effect of the characteristics of the dataset on the selection stability. In: International Conference on Tools with Artificial Intelligence. IEEE (2011). doi:10.1109/ICTAI.2011.167. http://ieeexplore.ieee.org/document/6103458. 1082-3409/11

  3. Alelyani, S., Zhao, Z., Liu, H.: A dilemma in assessing stability of feature selection algorithms. In: International Conference on High Performance Computing and Communications. IEEE (2011). doi:10.1109/HPCC.2011.99. http://ieeexplore.ieee.org/document/6063062. 978-0-7695-4538-7/11

  4. Machanavajjhala, A., Kifer, D., Gehrke, J., Venkita Subramanian, M.: â„“-diversity: privacy beyond k-anonymity. In: Proceedings of the International conference on Data Engineering (ICDE), p. 24 (2006)

    Google Scholar 

  5. Aggarwal, C.C.: On k-anonymity and the curse of dimensionality. In: Proceedings of the 31st VLDB Conference, Trondheim, Norway, pp. 901–909 (2005)

    Google Scholar 

  6. Prakash, A., Mogili, R.: Privacy preservation measure using t-closeness with combined l-diversity and k-anonymity. Int. J. Adv. Res. Comput. Sci. Electron. Eng. (IJARC SEE) 1(8), 28–33 (2012)

    Google Scholar 

  7. Li, T., Zhang, J., Molloy, I.: Slicing: a new approach for privacy preserving data publishing. IEEE Trans. KDD 24(3), 561–574 (2012)

    Google Scholar 

  8. Kuncheva, L.I.: A stability index for feature selection. In: Proceedings of the 25th Conference on IASTED International Multi conference: Artificial Intelligence and Applications, Anaheim, CA, USA, pp. 390–395. ACTA Press (2007)

    Google Scholar 

  9. Yu, L., Ding, C., Loscalzo, S.: Stable feature selection via dense feature groups. In: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008, New York, NY, USA, pp. 803–811. ACM (2008)

    Google Scholar 

  10. Saeys, Y., Abeel, T., Van de Peer, Y.: Robust feature selection using ensemble feature selection techniques. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008. LNCS, vol. 5212, pp. 313–325. Springer, Heidelberg (2008). doi:10.1007/978-3-540-87481-2_21

    Chapter  Google Scholar 

  11. Kalousis, A., Prados, J., Hilario, M.: Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl. Inf. Syst. 12(1), 95–116 (2007). http://link.springer.com/article/10.1007/s10115-006-0040-8

    Article  Google Scholar 

  12. Sudha, K., Jebamalar Tamilselvi, J.: A review of feature selection algorithms for data mining techniques. Int. J. Comput. Sci. Eng. (IJCSE) 7(6), 63–67 (2015). ISSN 0975-3397

    Google Scholar 

  13. Mani, K., Kalpana, P.: A review on filter based feature selection. Int. J. Innovative Res. Comput. Commun. Eng. (IJIRCCE) 4(5) (2016)

    Google Scholar 

  14. Hall, M.A.: Correlation-based feature selection for machine learning. Deptartment of Computer science, University of Waikato (1998). http://www.cs.waikato.ac.nz/mhall/thesis.pdf

  15. Frank, A., Asuncion, A.: UCI Machine Learning Repository. School of Information and Computer Science, University of California, Irvine, CA (2010). http://archive.ics.uci.edu/ml

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohana Chelvan P. .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

P., M.C., K., P. (2017). Analysis of Privacy Preserving Data Publishing Techniques for Various Feature Selection Stability Measures. In: Mandal, J., Dutta, P., Mukhopadhyay, S. (eds) Computational Intelligence, Communications, and Business Analytics. CICBA 2017. Communications in Computer and Information Science, vol 775. Springer, Singapore. https://doi.org/10.1007/978-981-10-6427-2_46

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-6427-2_46

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-6426-5

  • Online ISBN: 978-981-10-6427-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics