Analysis of Privacy Preserving Data Publishing Techniques for Various Feature Selection Stability Measures

P., Mohana Chelvan; K., Perumal

doi:10.1007/978-981-10-6427-2_46

Mohana Chelvan P.¹² &
Perumal K.¹³

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 775))

Included in the following conference series:

International Conference on Computational Intelligence, Communications, and Business Analytics

849 Accesses

Abstract

Data mining is the mining of formerly not known and valid information from the archived data of organizations. The datasets are mostly high dimensional which will make the data mining process difficult. Feature selection is the dimensionality reduction technique in data mining. Selection stability is the robustness of the feature selection algorithms for small perturbation of the dataset i.e., to select the same or similar subset of features in each subsequent iterations. Selection stability is mostly depending on the characteristics of the dataset. Privacy preserving data publishing techniques modify the dataset for preserving the privacy of the individuals and this perturbation will affect the selection stability. There will be correlation between the perturbations of the dataset for privacy preservation, feature selection stability and accuracy of the data mining results i.e., data utility. There will be various selection stability metrics to measure the selection stability. This paper analyses the privacy preserving data publishing techniques for these various feature selection stability measures on behalf of privacy preservation, selection stability and data utility.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Jain, A.K., Chandrasekaran, B.: Dimensionality and sample size considerations in pattern recognition practice. In: Krishnaiah, P.R., Kanal, L.N. (eds.) Handbook of Statistics, pp. 835–855. North-Holland Publishing Company, Amsterdam (1982)
Google Scholar
Alelyani, S., Liu, H.: The effect of the characteristics of the dataset on the selection stability. In: International Conference on Tools with Artificial Intelligence. IEEE (2011). doi:10.1109/ICTAI.2011.167. http://ieeexplore.ieee.org/document/6103458. 1082-3409/11
Alelyani, S., Zhao, Z., Liu, H.: A dilemma in assessing stability of feature selection algorithms. In: International Conference on High Performance Computing and Communications. IEEE (2011). doi:10.1109/HPCC.2011.99. http://ieeexplore.ieee.org/document/6063062. 978-0-7695-4538-7/11
Machanavajjhala, A., Kifer, D., Gehrke, J., Venkita Subramanian, M.: ℓ-diversity: privacy beyond k-anonymity. In: Proceedings of the International conference on Data Engineering (ICDE), p. 24 (2006)
Google Scholar
Aggarwal, C.C.: On k-anonymity and the curse of dimensionality. In: Proceedings of the 31st VLDB Conference, Trondheim, Norway, pp. 901–909 (2005)
Google Scholar
Prakash, A., Mogili, R.: Privacy preservation measure using t-closeness with combined l-diversity and k-anonymity. Int. J. Adv. Res. Comput. Sci. Electron. Eng. (IJARC SEE) 1(8), 28–33 (2012)
Google Scholar
Li, T., Zhang, J., Molloy, I.: Slicing: a new approach for privacy preserving data publishing. IEEE Trans. KDD 24(3), 561–574 (2012)
Google Scholar
Kuncheva, L.I.: A stability index for feature selection. In: Proceedings of the 25th Conference on IASTED International Multi conference: Artificial Intelligence and Applications, Anaheim, CA, USA, pp. 390–395. ACTA Press (2007)
Google Scholar
Yu, L., Ding, C., Loscalzo, S.: Stable feature selection via dense feature groups. In: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008, New York, NY, USA, pp. 803–811. ACM (2008)
Google Scholar
Saeys, Y., Abeel, T., Van de Peer, Y.: Robust feature selection using ensemble feature selection techniques. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008. LNCS, vol. 5212, pp. 313–325. Springer, Heidelberg (2008). doi:10.1007/978-3-540-87481-2_21
Chapter Google Scholar
Kalousis, A., Prados, J., Hilario, M.: Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl. Inf. Syst. 12(1), 95–116 (2007). http://link.springer.com/article/10.1007/s10115-006-0040-8
Article Google Scholar
Sudha, K., Jebamalar Tamilselvi, J.: A review of feature selection algorithms for data mining techniques. Int. J. Comput. Sci. Eng. (IJCSE) 7(6), 63–67 (2015). ISSN 0975-3397
Google Scholar
Mani, K., Kalpana, P.: A review on filter based feature selection. Int. J. Innovative Res. Comput. Commun. Eng. (IJIRCCE) 4(5) (2016)
Google Scholar
Hall, M.A.: Correlation-based feature selection for machine learning. Deptartment of Computer science, University of Waikato (1998). http://www.cs.waikato.ac.nz/mhall/thesis.pdf
Frank, A., Asuncion, A.: UCI Machine Learning Repository. School of Information and Computer Science, University of California, Irvine, CA (2010). http://archive.ics.uci.edu/ml

Download references

Author information

Authors and Affiliations

Department of Computer Science, Hindustan College of Arts and Science, Chennai, India
Mohana Chelvan P.
Department of Computer Applications, Madurai Kamaraj University, Madurai, India
Perumal K.

Authors

Mohana Chelvan P.
View author publications
You can also search for this author in PubMed Google Scholar
Perumal K.
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohana Chelvan P. .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, University of Kalyani, Kalyani, West Bengal, India
J. K. Mandal
Department of Computer and System Sciences, Visva Bharati University, Bolpur Santiniketan, West Bengal, India
Paramartha Dutta
Department of Information Technology, Calcutta Business School, Kolkata, India
Somnath Mukhopadhyay

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

P., M.C., K., P. (2017). Analysis of Privacy Preserving Data Publishing Techniques for Various Feature Selection Stability Measures. In: Mandal, J., Dutta, P., Mukhopadhyay, S. (eds) Computational Intelligence, Communications, and Business Analytics. CICBA 2017. Communications in Computer and Information Science, vol 775. Springer, Singapore. https://doi.org/10.1007/978-981-10-6427-2_46

Download citation

DOI: https://doi.org/10.1007/978-981-10-6427-2_46
Published: 24 September 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6426-5
Online ISBN: 978-981-10-6427-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics