Abstract
In this paper, we first present five state-of-the-art private blocking methods which rely mainly on random strings, clustering, and public reference sets. We emphasize on the drawbacks of these methods, and then, we present our L-fold redundant blocking scheme, that relies on the Locality-Sensitive Hashing technique for identifying similar records. These records have undergone an anonymization transformation using a Bloom filter-based encoding technique. Finally, we perform an experimental evaluation of all these methods and present the results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aggarwal, C., Yu, P.: The IGrid index: reversing the dimensionality curse for similarity indexing in high dimensional space. In: SIGKDD, pp. 119–129 (2000)
Al-Lawati, A., Lee, D., McDaniel, P.: Blocking-aware private record linkage. In: IQIS, pp. 59–68 (2005)
Christen, P.: Data Matching - Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Data-Centric Systems and Applications. Springer, Heidelberg (2012)
Christen, P.: A survey of indexing techniques for scalable record linkage and deduplication. TKDE 24(9), 1537–1555 (2012)
Cohen, W., Richman, J.: Learning to match and cluster large high-dimensional datasets for data integration. In: SIGKDD, pp. 475–480 (2002)
Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.: Locality-sensitive hashing scheme based on p-stable distributions. In: Symposium on Computational Geometry, pp. 253–262 (2004)
Durham, E.: A Framework For Accurate Efficient Private Record Linkage. Ph.D. thesis, Vanderbilt Univ., US (2012)
Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006)
Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: VLDB, pp. 518–529 (1999)
Goodman, J., O’Rourke, J., Indyk, P.: Handbook of Discrete and Computational Geometry. CRC, Boca Raton (2004)
Hall, R., Fienberg, S.E.: Privacy-preserving record linkage. In: Domingo-Ferrer, J., Magkos, E. (eds.) PSD 2010. LNCS, vol. 6344, pp. 269–283. Springer, Heidelberg (2010)
Hernandez, M., Stolfo, S.: Real world data is dirty: data cleansing and the merge/purge problem. DMKD 2(1), 9–37 (1988)
Inan, A., Kantarcioglou, M., Bertino, E., Scannapieco, M.: A hybrid approach to private record linkage. In: ICDE, pp. 496–505 (2008)
Inan, A., Kantarcioglu, M., Ghinita, G., Bertino, E.: Private record matching using differential privacy. In: EDBT, pp. 123–134 (2010)
Jin, L., Li, C., Mehrotra, S.: Efficient record linkage in large datasets. In: DASFAA, pp. 137–146 (2003)
Karakasidis, A., Verykios, V.: Privacy preserving record linkage using phonetic codes. In: BCI, pp. 101–106. IEEE (2009)
Karakasidis, A., Verykios, V.: A sorted neighborhood approach to multidimensional privacy preserving blocking. In: ICDM Workshops, pp. 937–944 (2012)
Karapiperis, D., Verykios, V.: A distributed near-optimal LSH-based framework for privacy-preserving record linkage. COMSIS 11(2), 745–763 (2014)
Karapiperis, D., Verykios, V.: A distributed framework for scaling up LSH-based computations in privacy preserving record linkage. In: BCI, pp. 102–109. ACM (2013)
Karapiperis, D., Verykios, V.: An LSH-based blocking approach with a homomorphic matching technique for privacy-preserving record linkage. TKDE 27(4), 909–921 (2015)
Kim, H., Lee, D.: Fast iterative hashed record linkage for large-scale data collections. In: EDBT, pp. 525–536 (2010)
Kuzu, M., Kantarcioglu, M., Inan, A., Bertino, E., Durham, E., Malin, B.: Efficient privacy-aware record integration. In: EDBT, pp. 167–178 (2013)
NCVR: North Carolina voter registration database. ftp://www.app.sboe.state.nc.us/data
Paillier, P.: Public-key cryptosystems based on composite degree residuosity classes. In: Stern, J. (ed.) EUROCRYPT 1999. LNCS, vol. 1592, pp. 223–238. Springer, Heidelberg (1999)
Rivest, R.: Chaffing and winnowing: Confidentiality without encryption. MIT Internal paper (2011)
Scannapieco, M., Figotin, I., Bertino, E., Elmagarmid, A.: Privacy preserving schema and data matching. In: SIGMOD, pp. 653–664 (2007)
Schnell, R., Bachteler, T., Reiher, J.: Privacy-preserving record linkage using bloom filters. BMC Med. Inf. Decis. Mak. 9(41), 1–11 (2009)
Sweeney, L.: k-anonymity: a model for protecting privacy. Uncertainty Fuzziness Knowl. Based Syst. 10(5), 557–570 (2002)
Vatsalan, D., Christen, P., Verykios, V.: An efficient two-party protocol for approximate matching in private record linkage. In: AUSDM, pp. 125–136 (2011)
Vatsalan, D., Christen, P., Verykios, V.: Efficient two-party private blocking based on sorted nearest neighborhood clustering. In: CIKM, pp. 1949–1958 (2013)
Vatsalan, D., Christen, P., Verykios, V.: A taxonomy of privacy-preserving record linkage techniques. Inf. Sys. 38(6), 946–969 (2013)
Weber, R., Schek, H., Blott, S.: A quantitative analysis and performance study for similarity search methods in high dimensional spaces. In: VLDB, pp. 194–205 (1998)
Yakout, M., Atallah, M., Elmagarmid, A.: Efficient private record linkage. In: ICDE, pp. 1283–1286 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Karapiperis, D., Verykios, V.S., Katsiri, E., Delis, A. (2016). A Tutorial on Blocking Methods for Privacy-Preserving Record Linkage. In: Karydis, I., Sioutas, S., Triantafillou, P., Tsoumakos, D. (eds) Algorithmic Aspects of Cloud Computing. ALGOCLOUD 2015. Lecture Notes in Computer Science(), vol 9511. Springer, Cham. https://doi.org/10.1007/978-3-319-29919-8_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-29919-8_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-29918-1
Online ISBN: 978-3-319-29919-8
eBook Packages: Computer ScienceComputer Science (R0)