Abstract
Semantic similarity aims at quantifying the resemblance between the meaning of textual terms. Thus, it represents the corner stone of textual understanding. Given the increasing availability and importance of textual sources within the current context of Information Societies, a lot of attention has been put in recent years in the development of mechanisms to automatically measure semantic similarity and to apply them to tasks dealing with textual inputs (e.g. document classification, information retrieval, question answering, privacy-protection, etc.). This chapter offers describes and discusses recent findings and proposals published by the authors on semantic similarity. Moreover, it also details recent works applying semantic similarity to privacy protection of textual data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Domingo-Ferrer, J.: A survey of inference control methods for privacy-preserving data mining. In: Aggarwal, C.C., Yu, P.S. (eds.) Privacy-Preserving Data Mining, pp. 53–80. Springer, Berlin (2008)
Torra, V.: Towards knowledge intensive data privacy. In: Proceedings of the 5th International Workshop on Data Privacy Management, pp. 1–7. Springer, Berlin (2011)
Martínez, S., Sánchez, D., Valls, A.: Semantic adaptive microaggregation of categorical microdata. Comput. Secur. 31, 653–672 (2012)
Neches, R., Fikes, R., Finin, T., Gruber, T., Patil, R., Senator, T., Swartout, W.R.: Enabling technology for knowledge sharing. AI Mag. 12, 36–56 (1991)
Cimiano, P.: Ontology Learning and Population from Text: Algorithms. Evaluation and Applications. Springer, Berlin (2006)
Stumme, G., Ehrig, M., Handschuh, S., Hotho, S., Madche, A., Motik, B., Oberle, D., Schmitz, C., Staab, S., Stojanovic, L., Stojanovic, N., Studer, R., Sure, Y., Volz, R., Zacharia, V.: The karlsruhe view on ontologies. University of Karlsruhe, Institute AIFB, Germany, Technical report (2003)
Rada, R., Mili, H., Bichnell, E., Blettner, M.: Development and application of a metric on semantic nets. IEEE Trans. Syst. Man Cybern. 9, 17–30 (1989)
Wu, Z., Palmer, M.: Verb semantics and lexical selection. In: 32nd Annual Meeting of the Association for Computational Linguistics, pp. 133–138. Association for Computational Linguistics (1994)
Leacock, C., Chodorow, M.: Combining Local Context and WordNet Similarity for Word Sense Identification. WordNet: An Electronic Lexical Database, pp. 265–283. MIT Press, Cambridge (1998)
Li, Y., Bandar, Z., McLean, D.: An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans. Knowl. Data Eng. 15, 871–882 (2003)
Batet, M., Sánchez, D., Valls, A.: An ontology-based measure to compute semantic similarity in biomedicine. J. Biomed. Inf. 44, 118–125 (2011)
Sánchez, D., Batet, M., Isern, D., Valls, A.: Ontology-based semantic similarity: a new feature-based approach. Expert Syst. Appl. 39, 7718–7728 (2012)
Rodríguez, M.A., Egenhofer, M.J.: Determining semantic similarity among entity classes from different ontologies. IEEE Trans. Knowl. Data Eng. 15, 442–456 (2003)
Petrakis, E.G.M., Varelas, G., Hliaoutakis, A., Raftopoulou, P.: X-similarity:computing semantic similarity between concepts from different ontologies. J. Digital Inf. Manage. 4, 233–237 (2006)
Ding, L., Finin, T., Joshi, A., Pan, R., Cost, R.S., Peng, Y., Reddivari, P., Doshi, V., Sachs, J.: Swoogle: A search and metadata engine for the semantic web. In: Thirteenth ACM International Conference on Information and Knowledge Management, CIKM 2004, pp. 652–659. ACM Press, New York (2004)
Resnik, P.: Using information content to evalutate semantic similarity in a taxonomy. In: 14th International Joint Conference on Artificial Intelligence, IJCAI 1995, pp. 448–453. Morgan Kaufmann Publishers Inc., Burlington (1995)
Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. In: International Conference on Research in Computational Linguistics, ROCLING X, pp. 19–33 (1997)
Lin, D.: An Information-theoretic definition of similarity. In: Fifteenth International Conference on Machine Learning, ICML 1998, pp. 296–304. Morgan Kaufmann, Burlington (1998)
Seco, N., Veale, T., Hayes, J.: An intrinsic information content metric for semantic similarity in wordNet. In: 16th European Conference on Artificial Intelligence, ECAI 2004, including Prestigious Applicants of Intelligent Systems, PAIS 2004, pp. 1089–1090. IOS Press, Valencia (2004)
Sánchez, D., Batet, M.: A new model to compute the information content of concepts from taxonomic knowledge. Int. J. Semant. Web Inf. Syst. 8, 34–50 (2012)
Sánchez, D., Batet, M., Isern, D.: Ontology-based Information Content computation. Knowl. Based Syst. 24, 297–303 (2011)
Sánchez, D., Batet, M., Valls, A., Gibert, K.: Ontology-driven web-based semantic similarity. J. Intell. Inf. Syst. 35, 383–413 (2009)
Pirró, G.: A semantic similarity metric combining features and intrinsic information content. Data Knowl. Eng. 68, 1289–1308 (2009)
Zhou, Z., Wang, Y., Gu, J.: A new model of information content for semantic similarity in wordNet. In: Second International Conference on Future Generation Communication and Networking Symposia, FGCNS 2008, pp. 85–89. IEEE Computer Society (2008)
Blank, A.: Words and concepts in time: towards diachronic cognitive onomasiology. In: Eckardt, R., von Heusinger, K., Schwarze, C. (eds.) Words and Concepts in Time: Towards Diachronic Cognitive Onomasiology, pp. 37–66. Mouton de Gruyter, Berlin, Germany (2003)
Al-Mubaid, H., Nguyen, H.A.: Measuring semantic similarity between biomedical concepts within multiple ontologies. IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev. 39, 389–398 (2009)
Sánchez, D., Solé-Ribalta, A., Batet, M., Serratosa, F.: Enabling semantic similarity estimation across multiple ontologies: an evaluation in the biomedical domain. J. Biomed. Inf. 45, 141–155 (2012)
Batet, M., Sánchez, D., Valls, A., Gibert, K.: Semantic similarity estimation from multiple ontologies. Appl. Intell. 38, 29–44 (2013)
Gómez-Pérez, A., Fernández-López, M., Corcho, O.: Ontological Engineering. Springer, Berlin (2004)
Tversky, A.: Features of similarity. Psycological Rev. 84, 327–352 (1977)
Sánchez, D., Batet, M.: A semantic similarity method based on information content exploiting multiple ontologies. Expert Syst. Appl. 40, 1393–1399 (2013)
Waltinger, U., Cramer, I., TonioWandmacher: from social networks to distributional properties: a comparative study on computing semantic relatedness. In: Thirty-First Annual Meeting of the Cognitive Science Society, CogSci 2009, pp. 3016–3021. Cognitive Science Society (2009)
Turney, P.D.: Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In: 12th European Conference on Machine Learning, ECML 2001, pp. 491–502. Springer, Berlin (2001)
Cilibrasi, R.L., Vitányi, P.M.B.: The google similarity distance. IEEE Trans. Knowl. Data Eng. 19, 370–383 (2006)
Bollegala, D., Matsuo, Y., Ishizuka, M.: A relational model of semantic similarity between words using automatically extracted lexical pattern clusters from the web. In: Conference on Empirical Methods in Natural Language Processing, EMNLP 2009, pp. 803–812. ACL and AFNLP, (2009)
Lemaire, B., Denhière, G.: Effects of high-order co-occurrences on word semantic similarities. Current Psychol. Lett. Behav. Brain Cogn. 18, 1 (2006)
Banerjee, S., Pedersen, T.: Extended gloss overlaps as a measure of semantic relatedness. In: 18th International Joint Conference on Artificial Intelligence, IJCAI 2003, pp. 805–810. Morgan Kaufmann, Burlington (2003)
Wan, S., Angryk, R.A.: Measuring semantic similarity using wordNet-based context Vectors. In: IEEE International Conference on Systems, Man and Cybernetics, SMC 2007, pp. 908–913. IEEE Computer Society (2007)
Patwardhan, S., Pedersen, T.: Using wordNet-based context vectors to estimate the semantic relatedness of concepts. In: EACL 2006 Workshop on Making Sense of Sense: Bringing Computational Linguistics and Psycholinguistics Together, pp. 1–8 (2006)
Harris, Z.: Distributional structure. In: Katz, J.J. (ed.) The Philosophy of Linguistics, pp. 26–47. Oxford University Press, New York (1985)
Sahami, M., Heilman, T.D.: A Web-based kernel function for measuring the similarity of short text snippets. In: 15th International World Wide Web Conference, WWW 2006, pp. 377–386. ACM Press, New York (2006)
Budanitsky, A., Hirst, G.: Evaluating wordnet-based measures of semantic distance. Comput. Linguist. 32, 13–47 (2006)
MRA Health Information Services, http://mrahis.com/blog/mra-thought-of-the-day-medical-record-redacting-a-burdensome-and-problematic-method-for-protecting-patient-privacy/
Martínez, S., Sánchez, D., Valls, A.: A semantic framework to protect the privacy of electronic health records with non-numerical attributes. J. Biomed. Inf. 46, 294–303 (2013)
Hundepool, A., Domingo-Ferrer, J., Franconi, L., Giessing, S., Nordholt, E.S., Spicer, K., Wolf, P.P.D.: Statistical Disclosure Control. Wiley, New York (2013)
Auer, S.R., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: A Nucleus for a Web of Open Data. In: The Semantic Web, p. 722 (2007)
Martínez, S., Valls, A., Sánchez, D.: Semantically-grounded construction of centroids for datasets with textual attributes. Knowl. Based Syst. 35, 160–172 (2012)
Domingo-Ferrer, J., Sánchez, D., Rufian-Torrel, G.: Anonymization of nominal data based on semantic marginality. Inf. Sci. 242, 35–48 (2013)
Batet, M.: Ontology based semantic clustering. AI Commun. 24, 291–292 (2011)
Batet, M., Erola, A., Sánchez, D., Castellà-Roca, J.: Utility preserving query log anonymization via semantic microaggregation. Inf. Sci. 242, 49–63 (2013)
Domingo-Ferrer, J., Torra, V.: Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Min. Knowl. Dis. 11, 195–212 (2005)
Martínez, S., Sánchez, D., Valls, A.: Towards k-anonymous non-numerical data via semantic resampling. In: Information Processing and Management of Uncertainty (IPMU), pp. 519–528 (2012)
Martínez, S., Sánchez, D., Valls, A., Batet, M.: Privacy protection of textual attributes through a semantic-based masking method. Inf. Fusion 13, 304–314 (2012)
Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. SRI International Report (1998)
Dwork, C.: Differential privacy. In: 33rd International Colloquium ICALP, pp. 1–12. Springer, Berlin (2006)
Soria-Comas, J., Domingo-Ferrer, J., Sánchez, D., Martínez, S.: Enhancing data utility in differential privacy via microaggregation-based k-anonymity. VLDB J. (2014) (in press)
Soria-Comas, J., Domingo-Ferrer, J., Sánchez, D., Martínez, S.: Improving the utility of differentially private data releases via k-anonymity. In: 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (2013)
Terrovitis, M., Mamoulis, N., Kalnis, P.: Privacy-preserving anonymization of set-valued data. In: VLDB Endowment, pp. 115–125 (2008)
Batet, M., Erola, A., Sánchez, D., Castellà-Roca, J.: Semantic anonymisation of set-valued data. In: 6th International Conference on Agents and Artificial Intelligence, pp. 102–112 (2014)
Sánchez, D., Batet, M., Viejo, A.: Automatic general-purpose sanitization of textual documents. IEEE Trans. Inf. Forensics Secur. 8, 853–862 (2013)
Sánchez, D., Batet, M., Viejo, A.: Minimizing the disclosure risk of semantic correlations in document sanitization. Inf. Sci. 249, 110–123 (2013)
Nettleton, D.G., Abril, D.: Document sanitization: measuring search engine information loss and risk of disclosure for the wikileaks cables. In: International Conference on Privacy in Statistical Databases, pp. 308–321 (2012)
Abril, D., Navarro-Arribas, G., Torra, V.: Towards a private vector space model for confidential documents. In: 28th Annual ACM Symposium on Applied Computing, pp. 944–945 (2013)
Batet, M.: Ontology-based semantic clustering. AI Commun. 24, 291–292 (2011)
Martínez, S., Sánchez, D., Valls, A.: Evaluation of the disclosure risk of masking methods dealing with textual attributes. Int. J. Innovative Comput. Inf. Control 8, 4869–4882 (2012)
Acknowledgments
Authors are solely responsible for the views expressed in this chapter, which do not necessarily reflect the position of UNESCO nor commit that organisation. This work was partly supported by the European Commission under FP7 project Inter-Trust, by the Spanish Ministry of Science and Innovation (through projects eAEGIS TSI2007-65406-C03-01, ICWT TIN2012-32757, ARES-CONSOLIDER INGENIO 2010 CSD2007-00004, CO-PRIVACY TIN2011-27076-C03-01 and BallotNext IPT-2012-0603-430000) and by the Government of Catalonia (under grant 2009 SGR 1135).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Batet, M., Sánchez, D. (2015). Contributions on Semantic Similarity and Its Applications to Data Privacy. In: Navarro-Arribas, G., Torra, V. (eds) Advanced Research in Data Privacy. Studies in Computational Intelligence, vol 567. Springer, Cham. https://doi.org/10.1007/978-3-319-09885-2_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-09885-2_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09884-5
Online ISBN: 978-3-319-09885-2
eBook Packages: EngineeringEngineering (R0)