skip to main content
10.1145/1858378.1858426acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesa2cwicConference Proceedingsconference-collections
research-article

A metadata and annotation extractor from PDF document for semantic web

Published: 16 September 2010 Publication History

Abstract

Research scholars undertake literature survey to identify and problem which they would like to address and possible solutions. As the part of this activity, they download research papers from internet, read them and write comments, observations, explanations or questions either on a separate sheet of a paper or on the paper itself. They use these notes and observations to firm up their understanding of research domain and to define their research problems. These notes and observations are very valuable knowledge asset for the research.
My work is motivated by a desire to capture and to make it available to the community of research scholars, so that they can be benefited from them.
In this paper, I present an editor which facilitates authoring annotations on PDF documents. I have designed a DTD (Document Type Definition) for annotation document. This DTD contains identity of annotation Author, identity of the paper on which annotation will be created, Type of annotation, Comment and Date_time elements. This type field is of enumeration type and may take a value "note", "comment", "insert", "help", "paragraph". "insert" is used to state that the annotation is not on the original PDF document but it is on another annotation. My tool provides a user-friendly interface to query these annotations on PDF document, to classify document on the basis of number of comments and also the relationships between annotations. My tool also extracts metadata from the PDF document. This metadata includes title, author, keywords, summary and date_time. This tool has been implemented using API of java PDF Box.

References

[1]
}}Rick Scanlan, Director, Sales Engineering, Pegasus Imaging Corporation. Annotating PDFs in Web-Based ECM Systems. Without Altering. (www.accusoft.com/Annotating_PDFs_In_Web_Based_ECM_Systems.pdf)
[2]
}}Kazantseva, A. and Szpakowicz, S. 2010. Summarizing short stories. Comput. Linguist. 36, 1 (Mar. 2010), 71--109. DOI= ttp://dx.doi.org/10.1162/coli.2010.36.1.36102
[3]
}}Anna Kazantseva, Stan Szpakowicz 2006. Proceedings of the Workshop on Task-Focused Summarization and Question Answering, pages 8--15, Sydney, July 2006. 2006 Association for Computational Linguistics University of Ottawathe.
[4]
}}Robert Charles Abiodum 2006. An Annotation Model for Document Tracking and Recommendation Services. International joint conference on computer, information, and system sciences and engineering, CIS2E 2006 Bridge Port, USA
[5]
}}Amaya, http://www.w3.org/Amaya/
[6]
}}Annotea Project, www.annotea.org
[7]
}}Co-ment, www.Co-ment.net/
[8]
}}A. Nnotate, http://a.nnotate.com/cms-annotation.html
[9]
}}http://www.foxitsoftware.com/pdf/reader/
[10]
}}http://www.pdfill.com/
[11]
}}W3C, RDF Primer, 2004. 2 (http://www.w3c.org)
[12]
}}W3C Semantic Web Activity Group. Accessed May 21, 2010.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
A2CWiC '10: Proceedings of the 1st Amrita ACM-W Celebration on Women in Computing in India
September 2010
425 pages
ISBN:9781450301947
DOI:10.1145/1858378
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 September 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. annotations
  2. metadata
  3. semantic web
  4. type

Qualifiers

  • Research-article

Conference

A2CWiC '10
A2CWiC '10: Emerging Trends in Computing
September 16 - 17, 2010
Coimbatore, India

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 31 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media