Design of a Digital Library for Early 20th Century Medico-legal Documents

Thoma, George R.; Mao, Song; Misra, Dharitri; Rees, John

doi:10.1007/11863878_13

Design of a Digital Library for Early 20^th Century Medico-legal Documents

George R. Thoma²⁰,
Song Mao²⁰,
Dharitri Misra²⁰ &
…
John Rees²⁰

Conference paper

938 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4172))

Abstract

The research value of important government documents to historians of medicine and law is enhanced by a digital library of such a collection being designed at the U.S. National Library of Medicine. This paper presents work toward the design of a system for preservation and access of this material, focusing mainly on the automated extraction of descriptive metadata needed for future access. Since manual entry of these metadata for thousands of documents is unaffordable, automation is required. Successful metadata extraction relies on accurate classification of key textlines in the document. Methods are described for the optimal scanning alternatives leading to high OCR conversion performance, and a combination of a Support Vector Machine (SVM) and Hidden Markov Model (HMM) for the classification of textlines and metadata extraction. Experimental results from our initial research toward an optimal textline classifier and metadata extractor are given.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Public Law 59-384, repealed in 1938 by 21 U.S.C. Sec 329 (a). And U.S Food and Drug Administration, Federal Food and Drugs Act of 1906 (The ”Wiley Act”), February 3 (2006), http://www.fda.gov/opacom/laws/wileyact.htm
Mao, S., Misra, D., Seamans, J., Thoma, G.R.: Design Strategies for a Prototype Electronic Preservation System for Biomedical Documents. In: Proc. IS&T Archiving Conference, Washington DC, pp. 48–53 (2005)
Google Scholar
DSpace at MIT, http://www.dspace.org
Java Remote Method Invocation, http://java.sun.com/products/jdk/rmi/
Cortes, C., Vapnik, V.: Support-vector Network. Machine Learning 20, 273–297 (1995)
MATH Google Scholar
Rabiner, L.R., Juang, B.H.: Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs (1993)
Google Scholar
Mao, S., Mansukhani, P., Thoma, G.R.: Feature Subset Selection and Classification using Class Syntax Models for Document Logical Entity Recognition. In: Proc. IEEE International Conference on Image Processing, Atlanta, GA (2006) (submitted)
Google Scholar

Download references

Author information

Authors and Affiliations

U.S. National Library of Medicine, Bethesda, Maryland, 20894, USA
George R. Thoma, Song Mao, Dharitri Misra & John Rees

Authors

George R. Thoma
View author publications
You can also search for this author in PubMed Google Scholar
Song Mao
View author publications
You can also search for this author in PubMed Google Scholar
Dharitri Misra
View author publications
You can also search for this author in PubMed Google Scholar
John Rees
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

No Affiliations,
Julio Gonzalo
Istituto di Scienza e Tecnologie dell’Informazione, Consiglio Nazionale delle Richerche, Via Moruzzi, 1, 56124, Pisa, Italy
Costantino Thanos
Dpto. Lenguajes y Sistemas Informáticos, UNED,
M. Felisa Verdejo
Dep. de Lenguajes y Sistemas Informáticos, Universidad de Alicante, E-03071, Alicante, Spain
Rafael C. Carrasco

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Thoma, G.R., Mao, S., Misra, D., Rees, J. (2006). Design of a Digital Library for Early 20^th Century Medico-legal Documents. In: Gonzalo, J., Thanos, C., Verdejo, M.F., Carrasco, R.C. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2006. Lecture Notes in Computer Science, vol 4172. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11863878_13

Download citation

DOI: https://doi.org/10.1007/11863878_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44636-1
Online ISBN: 978-3-540-44638-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics