Private Data Discovery for Privacy Compliance in Collaborative Environments

Korba, Larry; Wang, Yunli; Geng, Liqiang; Song, Ronggong; Yee, George; Patrick, Andrew S.; Buffett, Scott; Liu, Hongyu; You, Yonghua

doi:10.1007/978-3-540-88011-0_18

Larry Korba¹,
Yunli Wang¹,
Liqiang Geng¹,
Ronggong Song¹,
George Yee¹,
Andrew S. Patrick¹,
Scott Buffett¹,
Hongyu Liu¹ &
…
Yonghua You¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5220))

Included in the following conference series:

International Conference on Cooperative Design, Visualization and Engineering

1029 Accesses

Abstract

With the growing use of computers and the Internet, it has become difficult for organizations to locate and effectively manage sensitive personally identifiable information (PII). This problem becomes even more evident in collaborative computing environments. PII may be hidden anywhere within the file system of a computer. As well, in the course of different activities, via collaboration or not, personally identifiable information may migrate from computer to computer. This makes meeting the organizational privacy requirements all the more complex. Our particular interest is to develop technology that would automatically discover workflow across organizational collaborators that would include private data. Since in this context, it is important to understand where and when the private data is discovered, in this paper, we focus on PII discovery, i.e. automatically identifying private data existant in semi-structured and unstructured (free text) documents. The first part of the process involves identifying PII via named entity recognition. The second part determines relationships between those entities based upon a supervised machine learning method. We present test results of our methods using publicly-available data generated from different collaborative activities to provide an assessment of scalability in cooperative computing environment.

National Research Council Paper Number 50386.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 74.99; Price excludes VAT (USA)

Softcover Book: USD 99.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Position: GDPR Compliance by Construction

A Domain Ontology and Software Platform for Collaborative Personal Data Analytics

Privacy Preserving Data Mining: A Review of the State of the Art

References

Korba, L., Song, R., Yee, G., Patrick, A.S., Buffett, S., Wang, Y., Geng, L.: Private data management in collaborative environments. In: Luo, Y. (ed.) CDVE 2007. LNCS, vol. 4674, Springer, Heidelberg (2007)
Chapter Google Scholar
Aura, T., Kuhn, T.A., Roe, M.: Scanning electronic documents for personally identifiable information. In: Proc. of the Workshop on Privacy in the Electronic Society (WPES 2006), Washington, DC, October 2006, pp. 41–49 (2006)
Google Scholar
Agichtein, E., Cucerzan, S.: Predicting accuracy of extracting information from unstructured text collections. In: CIKM 2005, Bremen, Germany, pp. 413–420 (2005)
Google Scholar
Kambhatla, N.: Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations. In: Proc. of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL), Barcelona, Spain, July 21-26 (2004)
Google Scholar
Miller, S., Fox, H., Ramshaw, L., et al.: Description of the SIFT system used for MUC-7. In: Proc. of the 7th Message Understanding Conference (MUC-7) (1998)
Google Scholar
Luhn’s Algorithm on Wikipedia (last accessed: March 20, 2007), http://en.wikipedia.org/wiki/Luhn_algorithm
Han, H., Giles, C.L., Manavoglu, E., Zha, H., Zhang, Z., Fox, E.A.: Automatic document metadata extraction using support vector machines. In: Proceedings of the 2003 Joint Conference on Digital Libraries (JCDL 2003), Houston, Texas, May 27-31, pp. 37–48 (2003)
Google Scholar
Chang, C.H., Kayed, M., Girgis, M.R., Shaalan, K.F.: A Survey of Web Information Extraction Systems. IEEE Transactions on Knowledge and Data Engineering 18(10), 1411–1428 (2006)
Article Google Scholar
Turmo, J., Ageno, A., Catala, N.: Adaptive information extraction. ACM Computing Surveys 38(2), 4 (2006)
Article Google Scholar
Headers data, http://www.cs.cmu.edu/~kseymore/ie.html
Job posting data, http://www.cs.utexas.edu/users/ml/index.cgi?page=resourcesrepo
Enron random subset, http://www.cs.cmu.edu/~wcohen/
Weka, http://www.cs.waikato.ac.nz/ml/weka/
Song, R., Korba, L., Yee, G.: An Efficient Privacy-Preserving Data Mining Platform. In: The 4th Int. Conf. on Data Mining (DMIN 2008), Las Vegas, Nevada, July 14-17 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Information Technology, National Research Council of Canada, Building M-50, Montreal Road, Ottawa, Ontario, K1A 0R6
Larry Korba, Yunli Wang, Liqiang Geng, Ronggong Song, George Yee, Andrew S. Patrick, Scott Buffett, Hongyu Liu & Yonghua You

Authors

Larry Korba
View author publications
You can also search for this author in PubMed Google Scholar
Yunli Wang
View author publications
You can also search for this author in PubMed Google Scholar
Liqiang Geng
View author publications
You can also search for this author in PubMed Google Scholar
Ronggong Song
View author publications
You can also search for this author in PubMed Google Scholar
George Yee
View author publications
You can also search for this author in PubMed Google Scholar
Andrew S. Patrick
View author publications
You can also search for this author in PubMed Google Scholar
Scott Buffett
View author publications
You can also search for this author in PubMed Google Scholar
Hongyu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yonghua You
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Yuhua Luo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Korba, L. et al. (2008). Private Data Discovery for Privacy Compliance in Collaborative Environments. In: Luo, Y. (eds) Cooperative Design, Visualization, and Engineering. CDVE 2008. Lecture Notes in Computer Science, vol 5220. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88011-0_18

Download citation

DOI: https://doi.org/10.1007/978-3-540-88011-0_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88010-3
Online ISBN: 978-3-540-88011-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics