Abstract
Privacy protection is a fundamental issue in the era of big data. For personalized privacy protection, it is necessary to measure the amount or the degree of privacy leakage. To facilitate such measurement, semantic similarities and relationships of words should be determined since the words may come from multiple sources and present themselves in as many different ways as one can imagine, an intrinsic nature of big data. WordNet has been widely used for measuring the semantic similarity of words. This paper aims at analyzing the suitability of applying WordNet to measuring the semantic similarity or relatedness of words in the field of privacy. The analysis includes an experiment designed to obtain human rating scores as the benchmark dataset and a comprehensive comparison between results from four WordNet based measures and the human rating scores. The conclusion of the analysis is that current WordNet based measures are not very suitable for privacy measurement. Therefore, this paper also provides some suggestions on possible ways of enhancing WordNet to improve the effectiveness of semantic similarity or relatedness measurement in the field of privacy to support more effective and personalized privacy protection.







Similar content being viewed by others
References
Agirre, E., Alfonseca, E., Hall, K., Kravaloca, J., Pasca, M., & Soroa, A. (2009). A study on similarity and relatedness using distributional and WordNet-based approaches. In Proceedings of human language technologies: The 2009 annual conference of the North American chapter of the association for computational linguistics (pp. 19–27). Boulder, CO.
Wei, T., Lu, Y., Chang, H., Zhou, Q., & Bao, X. (2015). A semantic approach for text clustering using WordNet and lexical chains. Expert Systems with Applications, 42(4), 2264–2275.
Choi, S. M., Cho, D. J., Han, Y. S., & Man, K. L. (2015). Recommender systems using category correlations based on WordNet similarity. In Proceedings of the 2015 international conference on platform technology and service (pp. 5–6). Jeju.
Perera, K., & Karunarathne, D. (2015). KeyGraph and WordNet hypernyms for topic detection. In Proceedings of the 12th international joint conference on computer science and software engineering (pp. 303–308). Hatyai.
Zhang, Y., Li, B., Wang, X., Liu, X., & Chen, J. (2014). Mapping word senses of middle ancient Chinese to WordNet. In Proceedings of the 2014 IEEE/WIC/ACM international joint conferences on web intelligence and intelligent agent technologies (pp. 446–450). Warsaw.
Kolb, P. (2009). Experiments on the difference between semantic similarity and relatedness. In Proceedings of the 17th Nordic conference on computational linguistics (pp. 81–88). Odense.
Zhao, D., Qin, L., Liu, P., Ma, Z., & Li, Y. (2015). Computing terms semantic relatedness by knowledge in Wikipedia. In Proceedings of the 2015 12th web information system and application conference (pp. 107–111). Jinan.
Budanitsky, A., & Hirst, G. (2001). Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures. In Proceedings of the workshop on WordNet and other lexical resources, 2nd meeting of the North American chapter of the association for computational linguistics (Vol. 2(12), pp. 29–34). Pittsburgh, PA.
Budanitsky, A., & Hirst, G. (2006). Evaluating WordNet-based measures of lexical semantic relatedness. Computational Linguistics, 32(1), 13–47.
Patwardhan, S., & Pedersen, T. (2006). Using WordNet-based context vectors to estimate the semantic relatedness of concepts. In Proceedings of the EACL 2006 workshop on making sense of sense: Bringing computational linguistics and psycholinguistics together (pp. 1–8). Trento.
Gao, J. B., Zhang, B. W., & Chen, X. H. (2015). A WordNet-based semantic similarity measurement combining edge-counting and information content theory. Engineering Applications of Artificial Intelligence, 39, 80–88.
Han, L. S., Finin, T., McNamee, P. L., Joshi, A., & Yesha, Y. (2013). Improving word similarity by augmenting PMI with estimates of word polysemy. IEEE Transactions on Knowledge and Data Engineering, 25(6), 1307–1322.
Cranor, L., Langheinrich, M., Marchiori, M., Presler-Marshall, M., & Reagle, J. (2002). The platform for privacy preferences 1.0 (p3p 1.0) specification. W3C Recommendation.
Castro, J., Gómez, D., Molina, E., & Tejada, J. (2017). Improving polynomial estimation of the Shapley value by stratified random sampling with optimum allocation. Computers and Operations Research, 82, 180–188.
Maalej, M., Mtibaa, A., & Gargouri, F. (2015). Enriching user model ontology for handicraft domain by FOAF. In Proceedings of the 2015 IEEE/ACIS 14th international conference on computer and information science (pp. 651–655). Las Vegas, NV.
Ell, B., Hakimov, S., & Cimiano, P. (2016). Statistical induction of coupled domain/range restrictions from RDF knowledge bases. In Proceedings of the 15th international semantic web conference, lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) (#10579, pp. 27–40). Kobe.
Jilek, C., Maus, H., Schwarz, S., & Dengel, A. (2015). Diary generation from personal information models to support contextual remembering and reminiscence. In Proceedings of the 2015 IEEE international conference on multimedia and expo workshops. Turin.
Terzi, D. S., Terzi, R., & Sagiroglu, S. (2015). A survey on security and privacy issues in big data. In Proceedings of the 2015 10th international conference on internet technology and secured transactions (pp. 202–207). London.
Acknowledgements
The work in this paper has been supported by National Natural Science Foundation of China (No. 61602456) and National High Technology Research and Development Program of China (863 Program) (No. 2015AA017204).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhu, N., Wang, S., He, J. et al. On the Suitability of Applying WordNet to Privacy Measurement. Wireless Pers Commun 103, 359–378 (2018). https://doi.org/10.1007/s11277-018-5447-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11277-018-5447-5