Skip to main content
Log in

Personal Document Management and Retrieval: A Knowledge-Based Approach

  • Published:
Journal of Systems Integration

Abstract

This paper presents a knowledge-based approach to managing and retrieving personal documents. The dual document models consist of a document type hierarchy and a folder organization. The document type hierarchy is used to capture the layout, logical and conceptual structures of documents. The folder organization mimics the user's real-world document filing system for organizing and storing documents in an office environment. Predicate-based representation of documents is formalized for specifying knowledge about documents. Document filing and retrieval are predicate-driven. The filing criteria for the folders, which are specified in terms of predicates, govern the grouping of frame instances, regardless of their document types. We incorporated the notions of document type hierarchy and folder organization into the multilevel architecture of document storage. This architecture supports various text-based information retrieval techniques and content-based multimedia information retrieval techniques. The paper also proposes a knowledge-based query-preprocessing algorithm, which reduces the search space. For automating the document filing and retrieval, a predicate evaluation engine with a knowledge base is proposed. The learning agent is responsible for acquiring the knowledge needed by the evaluation engine.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. N. Bianchi, P. Mussio, M. Padula, and G. R. Rinaldi. Multimedia Document Management: An Anthropocentric Approach. Information Processing & Management, 32(3): 287–303, 96.

  2. A. Celentano, M.G. Fugini, and S. Pozzi. Knowledge-Based Document Retrieval in Office Environments: The Kabiria System. ACM Transactions on Office Information Systems, 13(3): 237–268, July 1995.

    Google Scholar 

  3. S.S. Chen. Document Preprocessing and Fuzzy Unsupervised Character Classification. PhD thesis, Department of Computer and Information Science, New Jersey Institute of Technology, Newark, New Jersey, May 1995.

    Google Scholar 

  4. S. Cisco and J. Wertzberger. Indexing Digital Documents. Inform, 11(2): 12–20, Feb. 1997.

    Google Scholar 

  5. X. Fan, Q. Liu, and P. A. Ng. A Multimedia Document Filing System. In Proc. of the IEEE International Conference on Multimedia Computing and Systems, pages 492–499, Ottawa, Ontario, Canada, June 1997.

  6. X. Hao. Automatic Office Document Classification and Information Extraction. PhD thesis, Department of Computer and Information Science, New Jersey Institute of Technology, Newark, New Jersey, Augest 1995.

    Google Scholar 

  7. W. Hu and G. Ritter. A Line String Image Representation for Image Storage and Retrieval. In Proc. of the International Conference on Multimedia Computing and Systems, pages 434–441, Ottawa, Ontario, Canada, June 1997.

  8. D.E. Knuth, J.H. Morris, and V.R. Pratt. Fast Pattern Matching in Strings. SIAM Journal of Computing, 6(2): 323–350, June 1977.

    Google Scholar 

  9. Q. Liu and P.A. Ng. Document Processing and Retrieval: Text Processing. Kluwer Academic Publishers, Norwell, MA, 1996.

    Google Scholar 

  10. C. Meghini, R. Fausto, and C. Thanos. Conceptual Modeling of Multimedia Document. Computer, 24(10): 23–29, 1991.

    Google Scholar 

  11. B.Di Nubila. Concept-Based Indexing and Retrieval of Multimedia Documents. Information Sciences, 20(3): 185–196, 94.

    Google Scholar 

  12. Esen Ozkarahan. Multimedia Document Retrieval. Information Processing& Management, 31(1): 113–131, 1995.

    Google Scholar 

  13. S. Pierre and H. Safa. Models for Storing and Presenting Multimedia Documents. Telematics and Informatics, 13(4): 233–250, 1996.

    Google Scholar 

  14. S. Pozzi and A. Celentano. Knowledge-Based Document Filing. IEEE Expert, pages 34–45, October 1993.

  15. M. Snoeck and G. Dedene. Generalization/Specification and Role in Object Oriented Conceptual Modeling. Data and Knowledge Engineering, 19(2): 171–195, June 1996.

    Google Scholar 

  16. C.Y. Wang, Q. Liu, and P.A. Ng. Intelligent Browser for TEXPROS. In Proceeding of International Conference on Intelligent Information Systems Technology, pages 389–398, Grand Bahamas Island, The Bahamas, December 1997.

  17. J.T.L. Wang and P.A. Ng. TEXPROS: An Intelligent Document Proceesing System. International Journal of Software Engineering and Knowledge Engineering, 15(4): 171–196, April 1992.

    Google Scholar 

  18. C. Wei. Knowledge Discovering for Document Classification Using Tree Matching in TEXPROS. PhD thesis, Department of Computer and Information Science, New Jersey Institute of Technology, Newark, New Jersey, May 1996.

    Google Scholar 

  19. R.J. Wirfs-Brock and R.E. Johnson. Surveying Current Research in Object-Oriented Design. Communications of the ACM, 33(9): 104–124, sept. 1990.

    Google Scholar 

  20. H. Yu and W. Wolf. A Visual Search System for Video and Image Database. In Proc. of the International Conference on Multimedia Computing and Systems, pages 517–524, Ottawa, Ontario, Canada, June 1997.

  21. Z. Zhu, J.A. McHugh, J.T.L. Wang, and P.A. Ng. A Formal Approach to Modeling Office Information Systems. Journal of Systems Integration, 4(4): 373–403, December 1994.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fan, X., Ng, P.A. Personal Document Management and Retrieval: A Knowledge-Based Approach. Journal of Systems Integration 8, 287–312 (1998). https://doi.org/10.1023/A:1026461329174

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1026461329174

Navigation