skip to main content
10.1145/1557914.1557987acmconferencesArticle/Chapter ViewAbstractPublication PageshtConference Proceedingsconference-collections
poster

When printed hypertexts go digital: information extraction from the parsing of indices

Published: 29 June 2009 Publication History

Abstract

Modern critical editions of ancient works generally include manually created indices of other sources quoted in the text. Since indices can be considered as a form of domain specific language, the paper presents a parsing-based approach to the problem of extracting information from them to support the creation of a collection of fragmentary texts. This paper first considers the characteristics and structure of quotation indices and their importance when dealing with fragmentary texts. It then presents the results of applying a fuzzy parser to the OCR transcription of an index of quotations to extract information from potentially noisy input.

References

[1]
A. Belaid, I. Turcan, J. M. Pierrel, Y. Belaid, Y. Hadjamar, and H. Hadjamar. Automatic indexing and reformulation of ancient dictionaries. In Proceedings of the First International Workshop on Document Image Analysis for Libraries (DIAL'04), page 342. IEEE Computer Society, 2004.
[2]
F. Boschetti. Methods to extend greek and latin corpora with variants and conjectures: Mapping critical apparatuses onto reference text. In Proceedings of the Corpus Linguistics Conference (CL2007), 2007.
[3]
O. Kolak and B. N. Schilit. Generating links by mining quotations. In Proceedings of the nineteenth ACM conference on Hypertext and hypermedia, pages 117--126, Pittsburgh, PA, USA, 2008. ACM.
[4]
D. Kolb. Scholarly hypertext: self-represented complexity. In Proceedings of the eighth ACM conference on Hypertext, pages 29--37, Southampton, United Kingdom, 1997. ACM.
[5]
R. Koppler. A systematic approach to fuzzy parsing. Software Practice and Experience, 27:637--649, 1997.
[6]
T. J. Parr and R. W. Quong. ANTLR: a predicated-LL(k) parser generator. Software Practice and Experience, 25:789--810, 1995.
[7]
D. R. Raymond and F. W. Tompa. Hypertext and the new oxford english dictionary. In Proceedings of the ACM conference on Hypertext, pages 143--153, Chapel Hill, North Carolina, United States, 1987. ACM.

Cited By

View all
  • (2012)Harvesting indices to grow a controlled vocabularyProceedings of the 6th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities10.5555/2390357.2390362(24-29)Online publication date: 24-Apr-2012

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
HT '09: Proceedings of the 20th ACM conference on Hypertext and hypermedia
June 2009
410 pages
ISBN:9781605584867
DOI:10.1145/1557914

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 June 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. indices
  2. information extraction
  3. parsing
  4. printed hypertexts

Qualifiers

  • Poster

Conference

HT '09
Sponsor:

Acceptance Rates

Overall Acceptance Rate 378 of 1,158 submissions, 33%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2012)Harvesting indices to grow a controlled vocabularyProceedings of the 6th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities10.5555/2390357.2390362(24-29)Online publication date: 24-Apr-2012

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media