Extracting Structured Subject Information from Digital Document Archives

Liu, Jyi-Shane; Lee, Ching-Ying

doi:10.1007/11931584_17

Jyi-Shane Liu^20,21 &
Ching-Ying Lee²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4312))

Included in the following conference series:

International Conference on Asian Digital Libraries

1166 Accesses
1 Citations

Abstract

Information extraction (IE) techniques are capable of decoding targeted subject information in documents, and reducing text data into a set of structured core information. The implication for digital libraries is that IE potentially serves as an enabling tool to extend the value of digital document archives. We present an approach, called sandwich extraction pattern, to address the closely coupled template relation tasks. The approach provides interactive capabilities for task specification, domain knowledge acquisition, and output evaluation. This allows users (e.g. librarians) to have direct control on the design of value-added content products and the performance of IE tools. We conducted empirical validation by implementing an IE system, called SEP, and field testing it in a practical document archive. Encouraged by successful test runs, NCCU library has formally initiated a project to develop a value-added content product of government personnel gazettes, including document images, electronic texts, and personnel changes database.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Applet, D.E., Israel, D.J.: Introduction to Information Extraction Technology. A Tutorial. In: Proceedings of the 16th Int’l Joint Conference on Artificial Intelligence (1999)
Google Scholar
Ciravegna, F.: Adaptive Information Extraction from Text by Rule Induction and Generalisation. In: Proceedings of the 17th IJCAI, pp. 1251–1256 (2001)
Google Scholar
Grishman, R.: Information Extraction: Techniques and Challenges. In: Pazienza, M.T. (ed.) SCIE 1997. LNCS, vol. 1299, pp. 10–27. Springer, Heidelberg (1997)
Google Scholar
Mohri, M.: Finite-State Transducers in Language and Speech Processing. Computational Linguistics 23(2), 269–311 (1997)
MathSciNet Google Scholar
Saracevic, T., Kantor, P.B.: Studying the Value of Library and Information Services, Part I: Establishing a Theoretical Framework. Journal of the American Society for Information Science 48(6), 527–542 (1997)
Article Google Scholar
Soderland, S.: Learning Information Extraction Rules for Semi-Structured and Free Text. Machine Learning 34(1-3), 233–272 (1999)
Article MATH Google Scholar
Wilks, Y., Catizone, R.: Can We Make Information Extraction More Adaptive? In: Pazienza, M.T. (ed.) SCIE 1999. LNCS (LNAI), vol. 1714, Springer, Heidelberg (1999)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, National Chengchi University, Taiwan, R.O.C.
Jyi-Shane Liu
University Library, National Chengchi University, Taiwan, R.O.C.
Jyi-Shane Liu
Department of English, National Taiwan Normal University, Taiwan, R.O.C.
Ching-Ying Lee

Authors

Jyi-Shane Liu
View author publications
You can also search for this author in PubMed Google Scholar
Ching-Ying Lee
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Tsukuba, Tsukuba 1-2, Ibaraki, Japan
Shigeo Sugimoto
The University of Queensland, St Lucia, Queensland, Australia
Jane Hunter
Vienna University of Technology, Vienna, Austria
Andreas Rauber
Research Center for Knowledge Communities, University of Tsukuba, 305-8550, Ibaraki, Japan
Atsuyuki Morishima

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, JS., Lee, CY. (2006). Extracting Structured Subject Information from Digital Document Archives. In: Sugimoto, S., Hunter, J., Rauber, A., Morishima, A. (eds) Digital Libraries: Achievements, Challenges and Opportunities. ICADL 2006. Lecture Notes in Computer Science, vol 4312. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11931584_17

Download citation

DOI: https://doi.org/10.1007/11931584_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49375-4
Online ISBN: 978-3-540-49377-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics