skip to main content
10.1145/1031171.1031287acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Towards smarter documents

Published: 13 November 2004 Publication History

Abstract

Document analysis research typically focuses on document image understanding or classic problems in text classification, clustering, summarization and discovery. While that is an important aspect of document management, in practice, documents lifecycles are often determined by the context of the business process that they are relevant to. It therefore becomes necessary for the document analysis techniques to recognize and leverage the contextual information provided by a supporting schema and business process. This paper presents an intelligent document management framework with relevant document analysis, metadata extraction, and business process association algorithms and methodology. The architecture supporting this framework seamlessly integrates a runtime environment with an authoring environment by combining relational data modeling tools with document classification techniques. The runtime environment accepts incoming documents, classifies the document, extracts metadata and executes customized business logic. The authoring environment supports the association of a class of documents with a relational document schema, identification of attribute values that must be extracted automatically, generation of relevant business logic, and deployment of authoring artifacts into the runtime architecture. We demonstrate the use of this framework with representative real-world document transformative applications.

References

[1]
Andries M. and Engels, G., A hybrid query language for the extended entity relationship model. In Journal of Visual Languages and Computing, 8(1), 1997, Special Issue on Visual Query Systems.
[2]
Angelaccio, M., Catarci, T. & Santucci, G., QBD*: A Fully Visual Query System. Journal on Visual Languages and Computing, 1(2), 255--273, 1990.
[3]
Bagdanov, A.D., Worring, M. Fine-Grained Document Genre Classification Using First Order Random Graphs. In Proceedings of ICDAR 01
[4]
Catarci, T., Costabile, M.F., Levialdi, S. and Batini, C. Visual Query Systems for Databases: A Survey. Technical Report SI/RR-95/17, Dipartimento di Scienze dell'Informazione, Universita' di Roma "La Sapienza", 1995.
[5]
Chen, P. P. Entity-Relationship Model: Towards a Unified View of Data. ACM Transactions on Database Systems, 1 1976, 9--36
[6]
See http://www-306.ibm.com/software/data/cm/
[7]
Dourish, P et al. Extending document management systems with user-specific active properties. In ACM Transactions on Information Systems (TOIS), Volume 18 Issue 2, 2000.
[8]
See: http://www.eclipse.org/emf/
[9]
Haber, E. M., Ioannidis, Y. E. and Livny, M. OPOSSUM: A Flexible Schema Visualization and Editing Tool. In Proceedings of the 1994 ACM CHI Conference, Boston, MA, April 1994.
[10]
Haber, E. M., Ioannidis, Y. E. and Livny, M. Opossum: Desk-Top Schema Management through Customizable Visualization. In Proceedings of the 21st International VLDB Conference, pages 527--538, Zurich, Switzerland, September 1995.
[11]
Hu, J., Kashi, R., Wilfong, G., Document Image Layout Comparison and Classification. In Proceedings of ICDAR 99.
[12]
Li, X., Ng, P.A. A Document Classification and Extraction System with Learning Ability. In proceedings of ICDAR 99.
[13]
Lyman, Peter and Hal R. Varian, How Much Information, 2000. Retrieved from <http://www.sims.berkeley.edu/how-much-info>
[14]
Mattos, N.M., Mitschang, B., Dengel, A. Bleisinger, R. An approach to integrated office document processing and management. In ACM SIGOIS Bulletin, Proceedings of the conference on Office information systems, Volume 11 Issue 2-3, 1990.
[15]
Morschheuser, S. and Raufer, H. Integrated document and workflow management applied to the offer processing of a machine tool company. In Proceedings of conference on Organizational computing systems, 1995.
[16]
Olston, C., Woodruff, A., Aiken, A., Chu, M., Ercegovac, V., Lin, M., Spalding, M. and Stonebraker, M. DataSplash. In Proceedings of the ACM SIGMOD '98, Seattle, Washington, June 1998.
[17]
Gornik, D. UML Data Modeling Profile. IBM Rational Software Whitepaper TP 162 05/02, 2003.
[18]
Gornik, D. Data Modeling for Data Warehouses. IBM Rational Software Whitepaper TP 161 05/02, 2002.
[19]
Simske, S.J., Arnabat, J. Editing and authoring: User-directed analysis of scanned images. In Proceedings of the 2003 ACM symposium on Document engineering, 2003.
[20]
See <http://www.rational.com/eda/ras/preview/index.htm>
[21]
Whelan, D. FileNet integrated document management database usage and issues. In ACM SIGMOD Record, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, Volume 27 Issue 2, 1998

Cited By

View all
  • (2019)Web-Log-Driven Business Activity MonitoringComputer10.1109/MC.2005.10938:3(61-68)Online publication date: 5-Jan-2019
  • (2013)XML-based intelligent document technology and its developmentIEEE Conference Anthology10.1109/ANTHOLOGY.2013.6785013(1-6)Online publication date: Jan-2013
  • (2012)Record Management and Design Reuse49th AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference <br> 16th AIAA/ASME/AHS Adaptive Structures Conference<br> 10t10.2514/6.2008-2005Online publication date: 14-Jun-2012
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '04: Proceedings of the thirteenth ACM international conference on Information and knowledge management
November 2004
678 pages
ISBN:1581138741
DOI:10.1145/1031171
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 November 2004

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. classification
  2. content
  3. processes
  4. workflow

Qualifiers

  • Article

Conference

CIKM04
Sponsor:
CIKM04: Conference on Information and Knowledge Management
November 8 - 13, 2004
D.C., Washington, USA

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2019)Web-Log-Driven Business Activity MonitoringComputer10.1109/MC.2005.10938:3(61-68)Online publication date: 5-Jan-2019
  • (2013)XML-based intelligent document technology and its developmentIEEE Conference Anthology10.1109/ANTHOLOGY.2013.6785013(1-6)Online publication date: Jan-2013
  • (2012)Record Management and Design Reuse49th AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference <br> 16th AIAA/ASME/AHS Adaptive Structures Conference<br> 10t10.2514/6.2008-2005Online publication date: 14-Jun-2012
  • (2011)Intelligent Document Gateway: A Service System Case Study and AnalysisService Systems Implementation10.1007/978-1-4419-7904-9_3(37-49)Online publication date: 6-Jan-2011
  • (2010)Intelligent document routing as a first step towards workflow automationProceedings of the 4th international conference on Leveraging applications of formal methods, verification, and validation - Volume Part I10.5555/1939281.1939309(276-284)Online publication date: 18-Oct-2010
  • (2010)Intelligent Document Routing as a First Step towards Workflow Automation: A Case Study Implemented in SQLLeveraging Applications of Formal Methods, Verification, and Validation10.1007/978-3-642-16558-0_24(276-284)Online publication date: 2010
  • (2006)Towards Scaleable and Adaptive Document Routing ServicesProceedings of the IEEE International Conference on Services Computing10.1109/SCC.2006.108(311-314)Online publication date: 18-Sep-2006
  • (2005)Eclipse modeling framework for document managementProceedings of the 2005 ACM symposium on Document engineering10.1145/1096601.1096653(220-222)Online publication date: 2-Nov-2005
  • (2005)Exploiting XML technologies for intelligent document routingProceedings of the 2005 ACM symposium on Document engineering10.1145/1096601.1096609(26-28)Online publication date: 2-Nov-2005

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media