KOSHIK- A Large-scale Distributed Computing Framework for NLP

Peter Exner; Pierre Nugues

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

KOSHIK- A Large-scale Distributed Computing Framework for NLP

Topics: Document Analysis and Understanding; Natural Language Processing

In Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods ICPRAM - Volume 1, 463-470, 2014 , ESEO, Angers, Loire Valley, France

Authors: Peter Exner and Pierre Nugues

Affiliation: Lund University, Sweden

Keyword(s): NLP Framework, Distributed Computing, Large Scale-processing, Hadoop, MapReduce.

Related Ontology Subjects/Areas/Topics: Applications ; Artificial Intelligence ; Document Analysis and Understanding ; Knowledge Engineering and Ontology Development ; Knowledge-Based Systems ; Natural Language Processing ; Pattern Recognition ; Software Engineering ; Symbolic Systems

Abstract: In this paper, we describe KOSHIK, an end-to-end framework to process the unstructured natural language content of multilingual documents. We used the Hadoop distributed computing infrastructure to build this framework as it enables KOSHIK to easily scale by adding inexpensive commodity hardware. We designed an annotation model that allows the processing algorithms to incrementally add layers of annotation without modifying the original document. We used the Avro binary format to serialize the documents. Avro is designed for Hadoop and allows other data warehousing tools to directly query the documents. This paper reports the implementation choices and details of the framework, the annotation model, the options for querying processed data, and the parsing results on the English and Swedish editions of Wikipedia.

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 18.227.13.192

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Exner, P. and Nugues, P. (2014). KOSHIK- A Large-scale Distributed Computing Framework for NLP. In Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods - ICPRAM; ISBN 978-989-758-018-5; ISSN 2184-4313, SciTePress, pages 463-470. DOI: 10.5220/0004707704630470

@conference{icpram14,
author={Peter Exner and Pierre Nugues},
title={KOSHIK- A Large-scale Distributed Computing Framework for NLP},
booktitle={Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods - ICPRAM},
year={2014},
pages={463-470},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004707704630470},
isbn={978-989-758-018-5},
issn={2184-4313},
}

TY - CONF

JO - Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods - ICPRAM
TI - KOSHIK- A Large-scale Distributed Computing Framework for NLP
SN - 978-989-758-018-5
IS - 2184-4313
AU - Exner, P.
AU - Nugues, P.
PY - 2014
SP - 463
EP - 470
DO - 10.5220/0004707704630470
PB - SciTePress