research-article

HengHa: data harvesting detection on hidden databases

Authors:
Shiyuan Wang

University of California, Santa Barbara, Santa Barbara, CA, USA

University of California, Santa Barbara, Santa Barbara, CA, USA
View Profile

,
Divyakant Agrawal

University of California, Santa Barbara, Santa Barbara, CA, USA

University of California, Santa Barbara, Santa Barbara, CA, USA
View Profile

,
Amr El Abbadi

University of California, Santa Barbara, Santa Barbara, CA, USA

University of California, Santa Barbara, Santa Barbara, CA, USA
View Profile

CCSW '10: Proceedings of the 2010 ACM workshop on Cloud computing security workshopOctober 2010Pages 59–64https://doi.org/10.1145/1866835.1866847

Published:08 October 2010Publication History

CCSW '10: Proceedings of the 2010 ACM workshop on Cloud computing security workshop

Pages 59–64

ABSTRACT

The back-end databases of web-based applications are a major data security concern to enterprises. The problem becomes more critical with the proliferation of enterprise hosted web applications in the cloud. While prior work has concentrated on malicious attacks that try to break into the database using vulnerabilities of web applications, little work has focused on the threat of data harvesting through web form interfaces, in which large collections of the underlying data can be harvested and sensitive information can be learnt by iteratively submitting legitimate queries and analyzing the returned results for designing new queries. To defend against data harvesting without compromising usability, we consider a detection approach. We summarize the characteristics of data harvesting, and propose the notions of query correlation and result coverage for data harvesting detection. We design a detection system called HengHa, in which Heng examines the correlation among queries in a session, and Ha evaluates the data coverage of the results of queries in the same session. The experimental results verify the effectiveness and efficiency of HengHa for data harvesting detection.

References

}}C. Borgelt. An implementation of the fp-growth algorithm. In OSDM '05: Proceedings of the 1st international workshop on open source data mining, pages 1--5, 2005. Google ScholarDigital Library
}}A. Dasgupta, G. Das, and H. Mannila. A random walk approach to sampling hidden databases. In SIGMOD Conference, pages 629--640, 2007. Google ScholarDigital Library
}}A. Dasgupta, X. Jin, B. Jewell, N. Zhang, and G. Das. Unbiased estimation of size and other aggregates over hidden web databases. In SIGMOD '10: Proceedings of the 2010 international conference on Management of data, pages 855--866, 2010. Google ScholarDigital Library
}}A. Dasgupta, N. Zhang, G. Das, and S. Chaudhuri. Privacy preservation of aggregates in hidden databases: why and how? In SIGMOD Conference, pages 153--164, 2009. Google ScholarDigital Library
}}D. E. Denning and J. Schlorer. Inference controls for statistical databases. Computer, 16(7):69--82, 1983. Google ScholarDigital Library
}}M. D. Dikaiakos, A. Stassopoulou, and L. Papageorgiou. An investigation of web crawler behavior: characterization and metrics. Computer Communications, 28(8):880--897, 2005. Google ScholarDigital Library
}}C. Farkas and S. Jajodia. The inference problem: a survey. SIGKDD Explor. Newsl., 4(2):6--11, 2002. Google ScholarDigital Library
}}J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, 2000. Google ScholarDigital Library
}}J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In SIGMOD '00: Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pages 1--12, 2000. Google ScholarDigital Library
}}Y.-W. Huang, S.-K. Huang, T.-P. Lin, and C.-H. Tsai. Web application security assessment by fault injection and behavior monitoring. In WWW, pages 148--159, 2003. Google ScholarDigital Library
}}A. Kamra, E. Terzi, and E. Bertino. Detecting anomalous access patterns in relational databases. VLDB J., 17(5):1063--1077, 2008. Google ScholarDigital Library
}}R. Kohavi, C. Brodley, B. Frasca, L. Mason, and Z. Zheng. KDD-Cup 2000 organizers' report: Peeling the onion. SIGKDD Explorations, 2(2):86--98, 2000. Google ScholarDigital Library
}}C. Kruegel and G. Vigna. Anomaly detection of web-based attacks. In CCS '03: Proceedings of the 10th ACM conference on Computer and communications security, pages 251--261, 2003. Google ScholarDigital Library
}}J. Madhavan, D. Ko, L. Kot, V. Ganapathy, A. Rasmussen, and A. Y. Halevy. Google's deep web crawl. PVLDB, 1(2):1241--1252, 2008. Google ScholarDigital Library
}}J. A. Orenstein and T. H. Merrett. A class of data structures for associative searching. In PODS, pages 181--190, 1984. Google ScholarDigital Library
}}K. Park, V. S. Pai, K.-W. Lee, and S. Calo. Securing web service by automatic robot detection. In ATEC '06: Proceedings of the annual conference on USENIX '06 Annual Technical Conference, pages 23--23, 2006. Google ScholarDigital Library
}}D. Pelleg and A. Moore. Accelerating exact k-means algorithms with geometric reasoning. In KDD '99: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 277--281, 1999. Google ScholarDigital Library
}}D. E. Robling Denning. Cryptography and data security. Addison-Wesley Longman Publishing Co., Inc., 1982. Google ScholarDigital Library
}}A. Roichman and E. Gudes. Diweda - detecting intrusions in web databases. In DBSec, pages 313--329, 2008. Google ScholarDigital Library
}}P.-N. Tan and V. Kumar. Discovery of web robot sessions based on their navigational patterns. Data Min. Knowl. Discov., 6(1):9--35, 2002. Google ScholarDigital Library
}}F. Valeur, D. Mutz, and G. Vigna. A learning-based approach to the detection of sql attacks. In DIMVA, pages 123--140, 2005. Google ScholarDigital Library
}}G. Vigna, W. Robertson, V. Kher, and R. Kemmerer. A Stateful Intrusion Detection System for World-Wide Web Servers. In Proceedings of the Annual Computer Security Applications Conference (ACSAC 2003), pages 34--43, 2003. Google ScholarDigital Library
}}S. Wang, D. Agrawal, and A. E. Abbadi. Hengha: Data harvesting detection on hidden databases. Technical Report 2010--13, Department of Computer Science, UCSB, 2010.Google ScholarDigital Library

Index Terms

HengHa: data harvesting detection on hidden databases

Recommendations

SecuBat: a web vulnerability scanner
WWW '06: Proceedings of the 15th international conference on World Wide Web

As the popularity of the web increases and web applications become tools of everyday use, the role of web security has been gaining importance as well. The last years have shown a significant increase in the number of web-based attacks. For example, ...
Read More
Selecting queries from sample to crawl deep web data sources

This paper studies the problem of selecting queries to efficiently crawl a deep web data source using a set of sample documents. Crawling deep web is the process of collecting data from search interfaces by issuing queries. One of the major challenges ...
Read More
Effective web-scale crawling through website analysis
WWW '06: Proceedings of the 15th international conference on World Wide Web

The web crawler space is often delimited into two general areas: full-web crawling and focused crawling. We present netSifter, a crawler system which integrates features from these two areas to provide an effective mechanism for web-scale crawling. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CCSW '10: Proceedings of the 2010 ACM workshop on Cloud computing security workshop
October 2010
118 pages
ISBN:9781450300896
DOI:10.1145/1866835
Program Chairs:
Adrian Perrig
Carnegie Mellon University, USA
,
Radu Sion
Stony Brook University, USA
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 October 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
crawling
data harvesting detection
query correlation
result coverage
sampling
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate37of108submissions,34%
Upcoming Conference
CCS '24

Sponsor:

sigsac

ACM SIGSAC Conference on Computer and Communications Security

October 14 - 18, 2024

Salt Lake City , UT , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 20
  Total Citations
  View Citations
- 415
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HengHa: data harvesting detection on hidden databases

CCSW '10: Proceedings of the 2010 ACM workshop on Cloud computing security workshop

ABSTRACT

References

Cited By

Index Terms

Recommendations

SecuBat: a web vulnerability scanner

Selecting queries from sample to crawl deep web data sources

Effective web-scale crawling through website analysis