Skip to main content
Log in

An Automated Document Filing System

  • Published:
Journal of Systems Integration

Abstract

TEXPROS (TEXt PROcessing System) is an automatic document processing system which supports text-based information representation and manipulation, conveying meanings from stored information within office document texts. A dual modeling approach is employed to describe office documents and support document search and retrieval. The frame templates for representing document classes are organized to form a document type hierarchy. Based on its document type, the synopsis of a document is extracted to form its corresponding frame instance. According to the user predefined criteria, these frame instances are stored in different folders, which are organized as a folder organization (i.e., repository of frame instances associated with their documents). The concept of linking folders establishes filing paths for automatically filing documents in the folder organization. By integrating document type hierarchy and folder organization, the dual modeling approach provides efficient frame instance access by limiting the searches to those frame instances of a document type within those folders which appear to be the most similar to the corresponding queries.

This paper presents an agent-based document filing system using folder organization. A storage architecture is presented to incorporate the document type hierarchy, folder organization and original document storage into a three-level storage system. This folder organization supports effective filing strategy and allows rapid frame instance searches by confining the search to the actual predicate-driven retrieval method. A predicate specification is proposed for specifying criteria on filing paths in terms of user predefined predicates for governing the document filing. A method for evaluating whether a given frame instance satisfies the criteria of a filing path is presented. The basic operations for constructing and reorganizing a folder organization are proposed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. A. Aho and M. Corasick, “Efficient string matching: An aid to bibliographic search.” Communications of the ACM18, pp. 333–340, 1975.

    Article  Google Scholar 

  2. D. Anastassiou, M. K. Brown, H. C. Jones, J. L. Mitchell, W. Pennebaker, and K. Penningtonlll, “Series 1 based videoconferencing system.” IBM Systems Journal22, pp. 110, 1983.

    Google Scholar 

  3. A. Celentano, M. Fugini, and S. Pozzi, “Querying office systems about document roles,” in Proceedings of the 14th Annual Int. ACM/SIGIR Conference on Research and Development in Information Retrieval, Chicago, Illinois, 1991, pp. 183–189.

  4. A. Celentano, M. Fugini, and S. Pozzi, “Knowledge-based document retrieval in office environments: The Kabiria system.” ACM Transactions on Office Information Systems13, pp. 237–268, 1995.

    Article  Google Scholar 

  5. S. Christodoulakis, M. Theodoridou, M. P. F. Ho, and A. Pathria, “Multimedia document presentation, information extraction, and document formation in MINOS: A model and system.” ACM Transactions on Office Information Systems4, pp. 345–383, 1986.

    Article  Google Scholar 

  6. P. Cohen and R. Kjeldsen, “Information retrieval by constrained spreading activation in semantic networks.” Information Processing and Management23, pp. 255–268, 1987.

    Article  Google Scholar 

  7. W. Croft, “NSF center for intelligent information retrieval.” Communications of the ACM38, pp. 42–43, 1995.

    Article  Google Scholar 

  8. W. Croft and R. Krovetz, “Interactive retrieval of office documents,” in Proceedings ACM-IEEE Conference on Office Information Systems, New York, pp. 228–235.

  9. P. Dadam and V. Linnemann, “Advanced Information Management (AIM): Advanced database technology for integrated applications.” IBM Systems Journal28, pp. 661–681, 1989.

    Google Scholar 

  10. P. J. Denning, “Electronic junk.” Communications of the ACM25, pp. 163–165, 1982.

    Article  Google Scholar 

  11. E. A. Fox, R. Akscyn, R. Furuta, and J. Leggett, “Digital libraries-introduction.” Communications of the ACM18, pp. 22–29, 1995.

    Article  Google Scholar 

  12. S. Gibbs and D. Tsichritzis, “A data modeling approach for office information systems.” ACM Transactions on Office Information Systems1, pp. 299–319, 1983.

    Article  Google Scholar 

  13. X. Hao, “Automatic office document classification and information extraction,” Ph. D. thesis, Department of Computer and Information Science, New Jersey Institute of Technology, Newark, New Jersey, 1995.

    Google Scholar 

  14. X. Hao, J. Wang, M. Bieber, and P. Ng, “Heuristic classification of office documents.” International Journal of Artificial Intelligence Tools3, pp. 233–265, 1994.

    Article  Google Scholar 

  15. X. Hao, J. Wang, and P. Ng, “Nested segmentation: An approach for layout analysis in document classification,” in Proceedings of the Second International Conference on Document Analysis and Recognition, Tsukuba Science City, Japan, 1993, pp. 319–322.

    Google Scholar 

  16. X. Hao, J. Wang, and P. A. Ng, “Information extraction from the structured part of office documents.” Information Sciences91, pp. 245–274, 1996.

    Article  Google Scholar 

  17. R. Hunter, P. Kaijser, and F. Nielsen, “ODA: A document architecture for open systems.” Computer Communication12, pp. 69–79, 1989.

    Article  Google Scholar 

  18. C. Huser, K. Reichenberger, L. Rostek, and N. Streitz, “Knowledge-based editing and visualization for hypermedia encyclopedias.” Communications of the ACM18, pp. 49–51, 1995.

    Article  Google Scholar 

  19. E. Ide and G. Salton, “Interactive search strategies and dynamic file organization in information retrieval,” in G. Salton (ed.): The Smart Retrieval System—Experiments in Automatic Document Processing. Englewood Cliffs, New Jersey, 1971, pp. 373–393.

  20. N. Jardine and C. van Rijsbergen, “The use of hierarchic clustering in information retrieval.” Information Storage and Retrieval7, pp. 217–240, 1971.

    Article  Google Scholar 

  21. J. J. Rocchio, J., “Relevance feedback in information retrieval,” in G. Salton (ed.): The Smart Retrieval System—Experiments in Automatic Document Processing. Englewood Cliffs, New Jersey, 1971, pp. 313–323.

  22. D. Knuth, J. Morris, and V. Pratt, “Fast pattern matching in strings.” SIAM Journal of Computing6, pp. 323–350, 1977.

    Google Scholar 

  23. Q. Liu, “An office document system with the capability of processing incomplete and vague queries,” Ph. D. thesis, Department of Computer and Information Science, New Jersey Institute of Technology, Newark, New Jersey, 1994.

  24. Q. Liu and P. Ng, “A browser of supporting vageu query processing in an office document system.” Journal of Systems Integration5, pp. 61–82, 1995.

    Google Scholar 

  25. Q. Liu and P. Ng, Document Processing and Retrieval: Text Processing. Kluwer Academic Publishers: Norwell, Massachusetts, 1996.

    Google Scholar 

  26. Q. Liu and P. Ng, “A query generalizer for providing cooperative responses in an office document system (revised version).” to appear in Data and Knowledge Engineering.

  27. Q. Liu, J.Wang, and P. Ng, “An office document retrieval system with the capability of processing incomplete and vague queries,” in Proceedings of the Fifth International Conference on Software Engineering and Knowledge, San Francisco, California, 1993, pp. 11–17.

  28. E. Lutz, H. Kleist-Retzow, and K. Hoernig, “MAFIA—Anactive mial-filter-agent for an intelligent document processing support.” Multi-User Interfaces and Applications, pp. 16–32, 1990.

  29. T. Malone, K. Grant, K. Lai, R. Rao, and D. Rosenblitt, “Semistructured messages are surprisingly useful for computer-supported coordination.” ACM Transactions on Office Information Systems5, pp. 115–131, 1987.

    Article  Google Scholar 

  30. J. Martin, The Wired Society: A Challenge for Tomorrow. Prentice-Hall: Englewood Cliffs, New Jersey, 1978.

    Google Scholar 

  31. B. McCune et al., “RUBRIC: A system for rule-based information retrieval.” IEEE Transactions on Software EngineeringSE-11, pp. 939–945, 1985.

    Google Scholar 

  32. C. Meadow, Text Information Retrieval Systems. Academic Press: San Diego, California, 1992.

    Google Scholar 

  33. F. Mhlanga, “D_Model and D_Algebra: A data model and algebra for office documents,” Ph. D. thesis, Department of Computer and Information Science, New Jersey Institute of Technology, Newark, New Jersey, 1993.

    Google Scholar 

  34. F. Mhlanga, Z. Zhu, J. Wang, and P. Ng, “A new approach to modeling personal office documents.” Data and Knowledge Engineering17, pp. 127–158, 1995.

    Article  Google Scholar 

  35. N. Naffah and A. Karmouch, “AGORA—An experiment in multimedia message systems.” Computer19, pp. 56–66, 1986.

    Google Scholar 

  36. E. Nodtvedt, “Information retrieval in the business environment.” Technical report, Department of Computer Science, Cornell University, Ithaca, New York, TR 80-447, 1980.

    Google Scholar 

  37. S. Pozzi and A. Celentano, “Knowledge-based document filing.” IEEE Expert, pp. 34–45, 1993.

  38. J. S. Quarterman and J. C. Hoskins, “Notable computer networks.” Communications of the ACM29, pp. 932–970, 1986.

    Article  Google Scholar 

  39. R. Rao, J. Pedersen, M. Hearst, J. Mackinlay, S. Card, L. Masinter, P.-K. Halvorsen, and G. Robertson, “Rich interaction in the digital library.” Communications of the ACM 18, pp. 25–39, 1995.

  40. S. Sakata and T. Ueda, “A distributed office mail system.” Computer18, pp. 106–116, 1985.

    Google Scholar 

  41. G. Salton, Automatic Text Processing: the Transformation, Analysis, and Retrieval of Information by Computer. Addison Wesley: Massachusetts, 1988.

    Google Scholar 

  42. G. Salton, J. Allan, and C. Buckley, “Automatic structuring and retrieval of large text files.” Communications of the ACM3, pp. 97–108, 1994.

    Article  Google Scholar 

  43. G. Salton and M. J. McGill, Introduction to Modern Information Retrieval. McGraw Hill: New York, 1983.

    Google Scholar 

  44. S. Saxin, “Computer-based real-time conferencing system.” Computer17, pp. 33–35, 1985.

    Google Scholar 

  45. M. Snoeck and G. Dedene, “Generalization/specification and role in object oriented conceptual modeling.” Data and Knowledge Engineering19, pp. 171–195, 1996.

    Article  Google Scholar 

  46. C. Thanos, Multimedia Office Filing: The MULTOS Approach. Elsevier Science Publishing Co., Inc.: New York, 1990.

    Google Scholar 

  47. R. Thomas, H. Forsdick, T. Crowley, R. Schaaf, R. Thomlinson, V. M Travers, and G. Robertson, “Diamond: A multimedia message system build on a distributed architecture.” IEEE Computer18, pp. 65–78, 1985.

    Google Scholar 

  48. D. Tsichritzis, “Form management.” Communications of the ACM25, pp. 453–478, 1982.

    Article  Google Scholar 

  49. D. Tsichritzis, S. Christodoulakis, P. Econopoulos, C. Faloutsos, A. Lee, D. Lee, K. Vanderbroek, and C. Woo, “A multimedia office filing system,” in Proceedings of 9th International Conference on Very Large Data Bases.

  50. M. Turoff and S. R. Hiltz, “The electronic journal: A progress report.” Journal of the ASIS33, pp. 195–202, 1982.

    Google Scholar 

  51. J. Tydeman, H. Lipinski, R. Adler, M. Nyhan, and L. Zwimpfer, Teletext and Videotex in the United States— Market Potential, Technology, Public Policy Issues. McGraw-Hill: New York, 1982.

    Google Scholar 

  52. W. Ulrich, “Introduction to electronic mail and implementation considerations in electronic mail,” in AFIPS Conference Proceedings, Arlington, Virginia, 1980, pp. 485–492.

  53. E. Voorhees, “The cluster hypothesis revisited,” in Proceedings of the Eighth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, 1985, pp. 188–196.

  54. C. Wang, Q. Liu, and P. Ng, “Browsing in an information repository,” in Proceedings of 2ndWorld Conference on Integrated Design and Process Technology, Austin, Texas, 1996, pp. 48–56.

  55. J. Wang, F. Mhlanga, Q. Liu, W. Shang, and P. Ng, “An intelligent documentation support environment,” in Proceedings of the Fifth International Conference on Software Engineering and Knowledge Engineering, San Francisco, California, 1993, pp. 429–436.

  56. J. Wang and P. Ng, “TEXPROS: An intelligent document processing system.” International Journal of Software Engineering and Knowledge Engineering15, pp. 171–196, 1992.

    Article  Google Scholar 

  57. C. Wei, “Knowledge discovering for document classification using tree matching in TEXPROS,” Ph. D. thesis, Department of Computer and Information Science, New Jersey Institute of Technology, Newark, New Jersey, 1996.

    Google Scholar 

  58. C. Wei, Q. Liu, J. Wang, and P. Ng, “Knowledge discovering for document classification using tree matching in TEXPROS (revised version).” Information Sciences100, pp. 255–310, 1997.

    Article  Google Scholar 

  59. C. Wei, J. Wang, X. Hao, and P. Ng, “In inductive learning and knowledge representation for document classification: The TEXPROS approach,” in Proceedings of 3rd International Conference on Systems Integration, Sao Paulo, SP, Brazil, 1994, pp. 1166–1175.

  60. M. Williams, “Electronic databases.” Science, pp. 445–446, 1985.

  61. C. Winkler, “Desktop publishing.” Datamation32, pp. 92–96, 1986.

    Google Scholar 

  62. R. Wirfs-Brock and R. Johnson, “Surveying current research in object-oriented design.” Communications of the ACM33, pp. 104–124, 1990.

    Article  Google Scholar 

  63. Z. Zhu, “On document filing based upon predicates,” PhD Dissertation Proposal, Department of Computer and Information Science, New Jersey Institute of Technology, Newark, New Jersey, 1994.

  64. Z. Zhu, Q. Liu, J. McHugh, and P. Ng, “A predicate driven document filing system.” Journal of Systems Integration6, pp. 373–403, 1996.

    Google Scholar 

  65. Z. Zhu, J. McHugh, J. Wang, and P. Ng, “A formal approach to modeling office information systems.” Journal of Systems Integration4, pp. 373–403, 1994.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fan, X., Liu, Q. & Ng, P. An Automated Document Filing System. Journal of Systems Integration 9, 223–262 (1999). https://doi.org/10.1023/A:1026484408645

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1026484408645

Navigation