skip to main content
10.1145/1866835.1866847acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article

HengHa: data harvesting detection on hidden databases

Authors Info & Claims
Published:08 October 2010Publication History

ABSTRACT

The back-end databases of web-based applications are a major data security concern to enterprises. The problem becomes more critical with the proliferation of enterprise hosted web applications in the cloud. While prior work has concentrated on malicious attacks that try to break into the database using vulnerabilities of web applications, little work has focused on the threat of data harvesting through web form interfaces, in which large collections of the underlying data can be harvested and sensitive information can be learnt by iteratively submitting legitimate queries and analyzing the returned results for designing new queries. To defend against data harvesting without compromising usability, we consider a detection approach. We summarize the characteristics of data harvesting, and propose the notions of query correlation and result coverage for data harvesting detection. We design a detection system called HengHa, in which Heng examines the correlation among queries in a session, and Ha evaluates the data coverage of the results of queries in the same session. The experimental results verify the effectiveness and efficiency of HengHa for data harvesting detection.

References

  1. }}C. Borgelt. An implementation of the fp-growth algorithm. In OSDM '05: Proceedings of the 1st international workshop on open source data mining, pages 1--5, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. }}A. Dasgupta, G. Das, and H. Mannila. A random walk approach to sampling hidden databases. In SIGMOD Conference, pages 629--640, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. }}A. Dasgupta, X. Jin, B. Jewell, N. Zhang, and G. Das. Unbiased estimation of size and other aggregates over hidden web databases. In SIGMOD '10: Proceedings of the 2010 international conference on Management of data, pages 855--866, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. }}A. Dasgupta, N. Zhang, G. Das, and S. Chaudhuri. Privacy preservation of aggregates in hidden databases: why and how? In SIGMOD Conference, pages 153--164, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. }}D. E. Denning and J. Schlorer. Inference controls for statistical databases. Computer, 16(7):69--82, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. }}M. D. Dikaiakos, A. Stassopoulou, and L. Papageorgiou. An investigation of web crawler behavior: characterization and metrics. Computer Communications, 28(8):880--897, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. }}C. Farkas and S. Jajodia. The inference problem: a survey. SIGKDD Explor. Newsl., 4(2):6--11, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. }}J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. }}J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In SIGMOD '00: Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pages 1--12, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. }}Y.-W. Huang, S.-K. Huang, T.-P. Lin, and C.-H. Tsai. Web application security assessment by fault injection and behavior monitoring. In WWW, pages 148--159, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. }}A. Kamra, E. Terzi, and E. Bertino. Detecting anomalous access patterns in relational databases. VLDB J., 17(5):1063--1077, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. }}R. Kohavi, C. Brodley, B. Frasca, L. Mason, and Z. Zheng. KDD-Cup 2000 organizers' report: Peeling the onion. SIGKDD Explorations, 2(2):86--98, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. }}C. Kruegel and G. Vigna. Anomaly detection of web-based attacks. In CCS '03: Proceedings of the 10th ACM conference on Computer and communications security, pages 251--261, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. }}J. Madhavan, D. Ko, L. Kot, V. Ganapathy, A. Rasmussen, and A. Y. Halevy. Google's deep web crawl. PVLDB, 1(2):1241--1252, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. }}J. A. Orenstein and T. H. Merrett. A class of data structures for associative searching. In PODS, pages 181--190, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. }}K. Park, V. S. Pai, K.-W. Lee, and S. Calo. Securing web service by automatic robot detection. In ATEC '06: Proceedings of the annual conference on USENIX '06 Annual Technical Conference, pages 23--23, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. }}D. Pelleg and A. Moore. Accelerating exact k-means algorithms with geometric reasoning. In KDD '99: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 277--281, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. }}D. E. Robling Denning. Cryptography and data security. Addison-Wesley Longman Publishing Co., Inc., 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. }}A. Roichman and E. Gudes. Diweda - detecting intrusions in web databases. In DBSec, pages 313--329, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. }}P.-N. Tan and V. Kumar. Discovery of web robot sessions based on their navigational patterns. Data Min. Knowl. Discov., 6(1):9--35, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. }}F. Valeur, D. Mutz, and G. Vigna. A learning-based approach to the detection of sql attacks. In DIMVA, pages 123--140, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. }}G. Vigna, W. Robertson, V. Kher, and R. Kemmerer. A Stateful Intrusion Detection System for World-Wide Web Servers. In Proceedings of the Annual Computer Security Applications Conference (ACSAC 2003), pages 34--43, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. }}S. Wang, D. Agrawal, and A. E. Abbadi. Hengha: Data harvesting detection on hidden databases. Technical Report 2010--13, Department of Computer Science, UCSB, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. HengHa: data harvesting detection on hidden databases

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            CCSW '10: Proceedings of the 2010 ACM workshop on Cloud computing security workshop
            October 2010
            118 pages
            ISBN:9781450300896
            DOI:10.1145/1866835

            Copyright © 2010 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 8 October 2010

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate37of108submissions,34%

            Upcoming Conference

            CCS '24
            ACM SIGSAC Conference on Computer and Communications Security
            October 14 - 18, 2024
            Salt Lake City , UT , USA

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader