Skip to main content
Log in

Trustworthy keyword search for compliance storage

  • Special Issue Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Intense regulatory focus on secure retention of electronic records has led to a need to ensure that records are trustworthy, i.e., able to provide irrefutable proof and accurate details of past events. In this paper, we analyze the requirements for a trustworthy index to support keyword-based search queries. We argue that trustworthy index entries must be durable—the index must be updated when new documents arrive, and not periodically deleted and rebuilt. To this end, we propose a scheme for efficiently updating an inverted index, based on judicious merging of the posting lists of terms. Through extensive simulations and experiments with two real world data sets and workloads, we demonstrate that the scheme achieves online update speed while maintaining good query performance. We also present and evaluate jump indexes, a novel trustworthy and efficient index for join operations on posting lists for multi-keyword queries. Jump indexes support insert, lookup and range queries in time logarithmic in the number of indexed documents.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Becker B., Gschwind S., Ohler T., Seeger B. and Widmayer P. (1996). An asymptotically optimal multiversion B-tree. Very Large Data Bases J. 5: 264–275

    Article  Google Scholar 

  2. libech, K., Gabillon, A.: Chronos: an authenticated dictionary based on skip lists for timestamping systems. In: Workshop on Secure Web Services, pp. 84–90 (2005)

  3. Brown, E., Callan, J., oft, W.: Fast inemental indexing for full-text information retrieval. In: Very Large Data Bases (VLDB) (1994)

  4. Brown, E.W., Callan, J.P., Croft, W.B., Moss, J.E.B.: Supporting full-text information retrieval with a persistent object store. In: Extending Database Technology (EDBT) (1994)

  5. Congress of the United States of America. Sarbanes-Oxley Act, 2002. Available at http://thomas.loc.gov

  6. Crescenzi, P., Kann, V.: A compendium of NP optimization problems. http://www.nada.kth.se/viggo/problemlist

  7. Cutting, D., Pedersen, J.: Optimization for dynamic inverted index maintenance. In: ACM Special Interest Group on Information Retrieval (SIGIR) (1990)

  8. Easton, M.C.: Key-sequence data sets on indelible storage. IBM J. Res. Develop. (1986) (in press)

  9. EMC Corp. EMC Centera Content Addressed Storage System, 2003. Available at http://www.emc.com/products/systems/centera_ce.jsp

  10. Faloutsos C. (1985). Access methods for text. ACM Comput. Surv. 17: 49–74

    Article  Google Scholar 

  11. Faloutsos, C., Jagadish, H.V.: On B-tree indices for skewed distributions. In: Very Large Data Bases (VLDB) (1992)

  12. Fontoura, M.F., Neumann, A., Rajagopalan, S., Shekita, E., Zien, J.: High performance index build algorithms for intranet search engines. In: Very Large Data Bases (VLDB) (2004)

  13. Goh, E., Shacham, H., Modadugu, N., Boneh, D.: Sirius: Securing remote untrusted storage. In: Network and Distributed System Security Symposium (NDSS) (2003)

  14. Goodrich, M., Tamassia, R., Schwerin, A.: Implementation of an authenticated dictionary with skip lists and commutative hashing. In: DARPA Information Survivability Conference and Exposition (DISCEX) II (2001)

  15. Garcia-Molina J.W.H. and Ullman J.D. (2000). Database Systems, A Complete Book. Prentice-Hall, New Jersely

    Google Scholar 

  16. Hacigumus, H., Iyer, B.R., Mehrotra, S.: Providing database as a service. In: International Conference on Data Engineering (ICDE) (2002)

  17. Heinz, S., Zobel, J.: Efficient single-pass index construction for text databases. In: Journal of the American Society for Information Science and Technology (JASIST) (2003)

  18. Hirai, J., Raghavan, S., Garcia-Molina, H., Paepcke, A.: Webbase: a repository of web pages. In: The International Journal of Computer and Telecommunications Networking, pp. 277–293 (2000)

  19. Huang, L., Hsu, W., Zheng, F.: Content immutable storage for trustworthy record keeping. In: NASA Conference on Mass Storage Systems and Technologies (MSST) (2006)

  20. IBM Corp. IBM TotalStorage DR550, 2004. Available at http://www-1.ibm.com/servers/storage/disk/dr

  21. Klimt, B., Yang, Y.: Introducing the Enron Corpus. In: Conference on Email and Anti-Spam (CEAS) (2004)

  22. Lester, N., Zobel, J., Williams, H.E.: In-place versus re-build versus re-merge: index maintenance strategies for text retrieval systems. In: Conference on Australasian Computer Science (2004)

  23. Miller, E.L., Freeman, W.E., Long, D.D.E., Reed, B.C.: Strong security for network-attached storage. In: USENIX Conference on File and Storage Technologies (FAST) (2002)

  24. Network Appliance, Inc. SnapLockTM Compliance and SnapLock Enterprise Software, 2003. Available at http://www.netapp.com/products/filer/snaplock.html

  25. Rathmann, P.: Dynamic data structures on optical disks. In: International Conference on Data Engineering (ICDE) (1984)

  26. Robertson, S.E., Walker, S., Hancock-Beaulieu, M., Gull, A., Lau, M.: Okapi at TREC. In: Text Retrieval Conference (TREC) (1992)

  27. Securities and Exchange Commission. Guidance to Broker- Dealers on the Use of Electronic Storage Media under the National Commerce Act of 2000 with Respect to Rule 17a-4(f), 2001. Available at http://www.sec.gov/rules/interp/34-44238.htm

  28. Krijnen, T., Meertens, L.G.L.T.: Making B-Trees Work for B.IW 219/83. The Mathematical Centre, Amsterdam, The Netherlands (1983)

  29. The Enterprise Storage Group, Inc. Compliance.: the effect on information management and the storage industry, May 2003. Available at http://www.enterprisestoragegroup.com

  30. Tomasic, A., García-Molina, H., Shoens, K.: Inemental updates of inverted lists for text document retrieval. In: Very Large Data Bases (VLDB) (1994)

  31. United States Department of Health and Human Services. Health Insurance Portability and Accountability Act of 1996. Available at http://www.hhs.gov/o/hipaa/

  32. Wittenm, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufman, San Francisco (1999)

  33. Zhu, Q., Hsu, W.: Fossilized index: the linchpin of trustworthy non-alterable electronic records. In: ACM SIGMOD International Conference on Management of Data (2005)

  34. Zipf G.K. (1949). Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. Addison-Wesley, Cambridge

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Soumyadeb Mitra.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mitra, S., Winslett, M., Hsu, W.W. et al. Trustworthy keyword search for compliance storage. The VLDB Journal 17, 225–242 (2008). https://doi.org/10.1007/s00778-007-0069-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-007-0069-7

Keywords

Navigation