Abstract
Engineers often need to look for the right pieces of information by sifting through long engineering documents. It is a very tiring and time-consuming job. To address this issue, researchers are increasingly devoting their attention to new ways to help information users, including engineers, to access and retrieve document content. The research reported in this paper explores how to use the key technologies of document decomposition (study of document structure), document mark-up (with EXtensible Mark-up Language (XML), HyperText Mark-up Language (HTML), and Scalable Vector Graphics (SVG)), and a facetted classification mechanism. Document content extraction is implemented via computer programming (with Java). An Engineering Document Content Management System (EDCMS) developed in this research demonstrates that as information providers we can make document content in a more accessible manner for information users including engineers.
The main features of the EDCMS system are
1) EDCMS is a system that enables users, especially engineers, to access and retrieve information at content rather than document level. In other words, it provides the right pieces of information that answer specific questions so that engineers don’t need to waste time sifting through the whole document to obtain the required piece of information.
2) Users can use the EDCMS via both the data and metadata of a document to access engineering document content.
3) Users can use the EDCMS to access and retrieve content objects, i.e. text, images and graphics (including engineering drawings) via multiple views and at different granularities based on decomposition schemes.
Experiments with the EDCMS have been conducted on semi-structured documents, a textbook of CADCAM, and a set of project posters in the Engineering Design domain. Experimental results show that the system provides information users with a powerful solution to access document content.
Similar content being viewed by others
References
J. Jacob, A. Sachde, S. Chakravarthy. CX-DIFF: a Change Detection Algorithm for XML Content and Change Visualisation for WebVigil. Data and Knowledge Engineering, vol. 52, no. 2, pp. 209–230, 2005.
T. Hendley. Reviewing the Options for Information and Records Management and Collaborative Working. Managing Information and Documents: The Definitive Guide, 16th ed., M-ID, London, pp. 11–35, 2005.
S. McKeever. Understanding Web Content Management Systems: Evolution, Lifecycle and Market. Industrial Management and Data Systems, vol. 103, no. 9, pp. 686–692, 2003.
L. H. Chen, W. L. Chue. Using Web Structure and Summarisation Techniques for Web Content Mining. Information Processing and Management, vol. 41, no. 5, pp. 1225–142, 2005.
J. T. Sprehe. The Positive Benefits of Electronic Records Management in the Context of Enterprise Content Management. Government Information Quarterly, vol. 22, no. 2, pp. 297–303, 2005.
J. Robertson. Is It Document Management or Content Management? [Online], Available: http://www.steptwo.com.au/ papers/cmb_dmorcm/index.html, June 30, 2006.
Extending Ccross the Organisation: Reuse and Collaboration with XML-based Content, Dynamic Content Software Strategies Consulting Service. CAP Ventures. [Online], Available: http://www.capv.com, June 30, 2006.
Sitecore Content Manager. [Online], Availabler: http://www.sitecore.net, June 30, 2006.
T. Wales. Library Subject Guides: a Content Management Case Study at the Open University, UK. Program — Electronic Library and Information Systems, vol. 39, no. 2, pp. 112–121, 2005.
A. Lowe. Studies of Information Use by Engineering Designers and the Development of Strategies to Aid in its Classification and Retrieval. Ph.D. dissertation, Bristol University, UK, 2002.
A. Lowe, C. A. McMahon, S. J. Culley. Characterising the Requirements of Engineering Information Systems. International Journal of Information Management, vol. 24, no. 5, pp. 401–422, 2004.
R. Fidel, M. Green. The Many Faces of Accessibility: Engineer’s Perception of Information Sources. Information Processing and Management, vol. 40, no. 3, pp. 563–581, 2004.
S. B. Harris, J. Owen, M. S. Bloor, I. Hogg. Engineering Document Management Strategy: Analysis of Requirements, Choice of Direction and System Implementation. Proceedings of the Institution of Mechanical Engineers Part B—Journal of Engineering Manufacture, vol. 211, no. 5, pp. 385–405, 1997.
P. J. Wild, S. J. Culley, C. A. McMahon, M. J. Darlington, S. Liu. Towards a Method for Profiling Engineering Documentation. In Proceedings of the 9th International Design Conference of DESIGN 2006, University of Zagreb Press, Dubrovnik, Croatia, pp. 1309–1318, 2006.
S. Liu, C. A. McMahon, M. J. Darlington, S. J. Culley, P. J. Wild. An Approach for Document Fragment Retrieval and its Formatting Issues in Engineering Information Management. Lecture Notes in Computer Science, vol. 3981, pp. 279–287, 2006.
C. A. McMahon, A. Lowe, S. J. Culley, M. Corderoy, R. Crossland, T. Shah, D. Stewart. Waypoint: an Integrated Search and Retrieval System for Engineering Documents. Journal of Computing and Information Science in Engineering, vol. 4, no. 4, pp. 329–338, 2004.
M. Erdmann, R. Studer. How to Structure and Access XML Documents with Ontology. Data and Knowledge Engineering, vol. 36, no. 3, pp. 317–335, 2001.
S. Klink, A. Dengel, T. Kieninger. Document Structure Analysis Based on Layout and Textual Features. [Online], Available: http://dbis.unitrier.de/Mitarbeiter/klink_files/www/Postscript/DAS2000FinalVersion.pdf, June 30, 2006.
J. Kingston, A. Macintosh. Knowledge Management through Multi-perspective Modelling: Representing and Distributing Organisational Memory. Knowledge-based Systems, vol. 13, no. 2–3, pp. 121–131, 2000.
K. A. Chatha, R. H. Weston, R. P. Monfared. An Approach to Modelling Dependencies Linking Engineering Processes. Proceedings of the Institution of Mechanical Engineers Part B — Journal of Engineering Manufacture, vol. 217, no. 5, pp. 669–687, 2003.
M. Fowler, K. Scott. UML Distilled: A Brief Guide to the Standard Object Modelling Language, Addison-Wesley, Boston, 2000.
XML DTD, W3C. [Online], Available: http://www.w3schools.com/dtd/, June 30, 2006.
DOM, W3C. [Online], Available: http://www.w3.org/DOM/, June 30, 2006.
H. Y. Kao, J. M. Ho, M. S. Chen. WISDOM: Web Intra-page Informative Structure Mining Based on Document Object Model. IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 5, pp. 614–627, 2005.
S. Liu, C. A. McMahon, M. J. Darlington, S. J. Culley, P. J. Wild. A Computational Framework for Retrieval of Document Fragments Based on Decomposition Schemes in Engineering Information Management. Advanced Engineering Informatics, vol. 20, no. 4, pp. 401–403, 2006.
D. Zijm. The History of Mark-up Languages. [Online], Available: http://www.luminoussolutions.com/data/history_of_mark-up.pdf, June 30, 2006.
B. K. Reid. Scribe: a Document Specification Language and its Compiler. Ph.D. dissertation, Carnegie-Mellon University, USA, 1981.
L. Lamport. LATEX: A Document Preparation System: User’s Guide and Reference Manual. Addison-Wesley, London, 1986.
Document Mark-up Meta-language: GENCODE and the Standard Generalized Mark-up Language (SGML), GCA standard 101, 1983.
Office Document Architecture (ODA), ISO/DIS 8613, Information processing, 1986.
D. W. Langridge. Classification: Its Kinds, Elements, Systems and Applications. Bowker-Saur, London, 1992.
J. Rowley, J. Farrow. Organising Knowledge: an Introduction to Information Retrieval, 3rd ed. Gower Publishing, London, 2000.
A. C. Foskett. The Subject Approach to Information, 5th ed., Library Association Publishing, London, 1996.
M. L. Mackenzie. The Personal Organization of Electronic Mail Messages in a Business Environment: An Exploratory Study. Library and Information Science Research, vol. 22, no. 4, pp. 405–426, 2000.
A. Taylor. Introduction to Cataloguing and Classification. Libraries Unlimited, London, 1992.
J. Mills. Facetted Classification and Logical Division in Information Retrieval. Library Trends, vol. 52, no. 3, pp. 541–570, 2004.
T. Quatrani. Visual Modelling with Rational Rose 2000 and UML. Addison-Wesley, Boston, 2000.
C. F. Goldfarb. A Generalized Approach to Document Mark-up. In Proceedings of the ACM SIGPLAN SIGOA Symposium on Text Manipulation, Portland, Oregon, SIGPLAN Notices, vol. 16, no. 6, pp. 68–73, 1981.
J. D. Eisenberg. SVG Essentials. O’Reilly, Beijing, 2002.
S. Gupta, G. E. Kaiser, P. Grimm, M. F. Chiang, J. Starren. Automating Content Extraction of HTML Documents. World Wide Web—Internet and Web Information Systems, vol. 8, no. 2, pp. 179–224, 2005.
D. A. Lizorkin, K. Y. Lisovsky. Implementation of the XML Linking Language XLink by Functional Methods. Programming and Computer Software, vol. 319, no. 1, pp. 34–46, 2005.
XLink and XPointer. W3C. [Online], Available: http://www.w3.org/XML/Linking, June 30, 2006.
E. Freeman. Head First HTML with CSS and XHTML. O’Reilly, Beijing, 2002.
G. Falquet, C. L. Mottaz-Jiang, J. C. Ziswiler. Ontology Based Interfaces to Access a Library of Virtual Hyper-books. Lecture Notes in Computer Science, vol. 3232, pp. 99–110, 2004.
C. A. McMahon, J. Browne. CADCAM Principles, Practice and Manufacturing Management, 2nd ed., Addison-Wesley, Harlow, England, 1998.
H. S. Na, O. H. Choi. FSMI: MDR-based Metadata Interoperability Framework for Sharing XML Documents, Systems Modelling and Simulation: Theory and Applications. Lecture Notes in Computer Science, vol. 3398, pp. 343–351, 2005.
P. Rigaux, and N. Spyratos. Metadata Inference for Document Retrieval in a Distributed Repository. Lecture Notes in Computer Science, vol. 3321, pp. 418–436, 2004.
Dublin Core. [Online], Available: http://dublincore.org/, June 30, 2006.
S. Ahmed, K. M. Wallace. Identifying and Supporting the Knowledge Needs of Novice Designers within the Aerospace Industry. Journal of Engineering Design, vol. 15, no. 5, pp. 475–492, 2004.
S. Liu, C. A. McMahon, M. J. Darlington, S. J. Culley, P. J. Wild. An Automatic Mark-up Approach for Structured Document Retrieval in Engineering Design. In Proceeding of International Conference of Manufacturing Research, Liverpool, UK, pp. 23–28, 2006.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported by the UK Engineering and Physical Sciences Research Council (EPSRC) (No. GR/R67507/01).
Shaofeng Liu received her B.Sc and M.Sc degrees from Hunan University, China and Ph.D. degree from Loughborough University, UK. She is currently a research officer of Engineering Innovative Manufacturing Research Centre (IMRC) at the University of Bath, UK.
Her current research interests include study of information and knowledge structures, Web technology, information and knowledge system, and mark-up technology application in engineering (design and manufacture) domain.
Chris McMahon is a professor of and the director of Engineering IMRC at the University of Bath, UK. He has carried out research in the information requirements of engineering designers, in information and knowledge management systems for design, in risk and uncertainty management in design, in component durability and reliability, especially in the presence of residual stresses, and in design for remanufacturing, largely in conjunction with industry.
He has published his work widely, including over 150 refereed papers, a number of edited volumes and a textbook on computer-aided design and manufacture. His research interests include engineering design, especially concerning the application of computers to the management of information and uncertainty in design, and to design automation.
Mansur Darlington is a research officer in the Design Information & Knowledge Group of Engineering IMRC at the University of Bath, UK. For the last decade he has been involved in research associated with the capture and codification of engineers’ design knowledge and the development of methods for supporting engineers’ information needs, first in the University of Bath’s Engineering Design Centre, latterly in the IMRC.
His research interests include the capture and representation of design information and knowledge for reasoning about conceptual design, the development of reasoning techniques associated with engineering document and document content retrieval, and the development of methods for controlled document origination to support standardization for reuse and for information minimization.
Steve Culley is a professor of Engineering Design and head of Design in the Department of Mechanical Engineering at the University of Bath, UK. He has researched in the engineering design field for many years. In particular this has included the provision of information and knowledge to support engineering designers. He pioneered work into the introduction and use of the electronic catalogue for standard engineering components and has extended this work to deal with systems and assemblies.
He has over 150 publications and has recently co-authored a book on design and changeover.
Prof. Culley is a member of the EPSRC funded IMRC at the University of Bath, UK and is part of the ‘Grand Challenge’—Immortal Information and Through-life Knowledge Management. He is a fellow of the Institution of Mechanical Engineers.
Peter Wild graduated from Bournemouth University in psychology and computing, and received his M.Sc degree from the University of London, (Queen Mary) in human computer interaction. He is a research officer in IMRC at the University of Bath, UK.
His interests include documents, empirical studies of engineers, design, task analysis, and user interface design.
Rights and permissions
About this article
Cite this article
Liu, S., McMahon, C., Darlington, M. et al. EDCMS: A content management system for engineering documents. Int J Automat Comput 4, 56–70 (2007). https://doi.org/10.1007/s11633-007-0056-x
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/s11633-007-0056-x