Abstract
Data mining is the mining of formerly not known and valid information from the archived data of organizations. The datasets are mostly high dimensional which will make the data mining process difficult. Feature selection is the dimensionality reduction technique in data mining. Selection stability is the robustness of the feature selection algorithms for small perturbation of the dataset i.e., to select the same or similar subset of features in each subsequent iterations. Selection stability is mostly depending on the characteristics of the dataset. Privacy preserving data publishing techniques modify the dataset for preserving the privacy of the individuals and this perturbation will affect the selection stability. There will be correlation between the perturbations of the dataset for privacy preservation, feature selection stability and accuracy of the data mining results i.e., data utility. There will be various selection stability metrics to measure the selection stability. This paper analyses the privacy preserving data publishing techniques for these various feature selection stability measures on behalf of privacy preservation, selection stability and data utility.
References
Jain, A.K., Chandrasekaran, B.: Dimensionality and sample size considerations in pattern recognition practice. In: Krishnaiah, P.R., Kanal, L.N. (eds.) Handbook of Statistics, pp. 835–855. North-Holland Publishing Company, Amsterdam (1982)
Alelyani, S., Liu, H.: The effect of the characteristics of the dataset on the selection stability. In: International Conference on Tools with Artificial Intelligence. IEEE (2011). doi:10.1109/ICTAI.2011.167. http://ieeexplore.ieee.org/document/6103458. 1082-3409/11
Alelyani, S., Zhao, Z., Liu, H.: A dilemma in assessing stability of feature selection algorithms. In: International Conference on High Performance Computing and Communications. IEEE (2011). doi:10.1109/HPCC.2011.99. http://ieeexplore.ieee.org/document/6063062. 978-0-7695-4538-7/11
Machanavajjhala, A., Kifer, D., Gehrke, J., Venkita Subramanian, M.: â„“-diversity: privacy beyond k-anonymity. In: Proceedings of the International conference on Data Engineering (ICDE), p. 24 (2006)
Aggarwal, C.C.: On k-anonymity and the curse of dimensionality. In: Proceedings of the 31st VLDB Conference, Trondheim, Norway, pp. 901–909 (2005)
Prakash, A., Mogili, R.: Privacy preservation measure using t-closeness with combined l-diversity and k-anonymity. Int. J. Adv. Res. Comput. Sci. Electron. Eng. (IJARC SEE) 1(8), 28–33 (2012)
Li, T., Zhang, J., Molloy, I.: Slicing: a new approach for privacy preserving data publishing. IEEE Trans. KDD 24(3), 561–574 (2012)
Kuncheva, L.I.: A stability index for feature selection. In: Proceedings of the 25th Conference on IASTED International Multi conference: Artificial Intelligence and Applications, Anaheim, CA, USA, pp. 390–395. ACTA Press (2007)
Yu, L., Ding, C., Loscalzo, S.: Stable feature selection via dense feature groups. In: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008, New York, NY, USA, pp. 803–811. ACM (2008)
Saeys, Y., Abeel, T., Van de Peer, Y.: Robust feature selection using ensemble feature selection techniques. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008. LNCS, vol. 5212, pp. 313–325. Springer, Heidelberg (2008). doi:10.1007/978-3-540-87481-2_21
Kalousis, A., Prados, J., Hilario, M.: Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl. Inf. Syst. 12(1), 95–116 (2007). http://link.springer.com/article/10.1007/s10115-006-0040-8
Sudha, K., Jebamalar Tamilselvi, J.: A review of feature selection algorithms for data mining techniques. Int. J. Comput. Sci. Eng. (IJCSE) 7(6), 63–67 (2015). ISSN 0975-3397
Mani, K., Kalpana, P.: A review on filter based feature selection. Int. J. Innovative Res. Comput. Commun. Eng. (IJIRCCE) 4(5) (2016)
Hall, M.A.: Correlation-based feature selection for machine learning. Deptartment of Computer science, University of Waikato (1998). http://www.cs.waikato.ac.nz/mhall/thesis.pdf
Frank, A., Asuncion, A.: UCI Machine Learning Repository. School of Information and Computer Science, University of California, Irvine, CA (2010). http://archive.ics.uci.edu/ml
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
P., M.C., K., P. (2017). Analysis of Privacy Preserving Data Publishing Techniques for Various Feature Selection Stability Measures. In: Mandal, J., Dutta, P., Mukhopadhyay, S. (eds) Computational Intelligence, Communications, and Business Analytics. CICBA 2017. Communications in Computer and Information Science, vol 775. Springer, Singapore. https://doi.org/10.1007/978-981-10-6427-2_46
Download citation
DOI: https://doi.org/10.1007/978-981-10-6427-2_46
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6426-5
Online ISBN: 978-981-10-6427-2
eBook Packages: Computer ScienceComputer Science (R0)