Conferences >2015 IEEE International Confe...

Scaling NLP algorithms to meet high demand

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

The growth of digital information and the richness of data shared online make it increasingly valuable to be able to process large amounts of data at a very high throughp...Show More

Metadata

Abstract:

The growth of digital information and the richness of data shared online make it increasingly valuable to be able to process large amounts of data at a very high throughput rate. At the same time, rising interest in natural language processing (NLP) has resulted in the development of a great number of algorithms designed to perform a variety of NLP tasks. There is a need for frameworks that enable multiple users and applications to run individual or a combination of NLP algorithms to derive relevant information from data [1]. In this work, we take multiple NLP algorithms that adhere to the ADEPT framework and deploy them on distributed processing architectures to satisfy the dual needs of serving a large user group and meeting high throughput standards, while reducing the time from lab to production environment. The ADEPT framework provides a set of uniform APIs for interacting with a diverse set of NLP algorithms by defining a set of data structures for representing NLP concepts [2]. It offers multiple access points for interacting with these algorithms; a REST API, a serialized Data API, and processor components that can be used in a larger pipeline. The comprehensive ADEPT architecture can support algorithms that perform sentence-level, document-level, or corpus-level text processing, allowing a wide range of NLP algorithms to make use of the framework. ADEPT interfaces allow parallelization to occur at an optimum level for each algorithm. Amazon Web Services (AWS) consists of a stack of technologies commonly used in the commercial sphere to host web applications designed to scale rapidly with a growing user base. The Amazon Elastic Compute Cloud (EC2) and its auto-scaling feature in particular provide a means of reliably and efficiently scaling a service to meet traffic demands. Hadoop and Spark are top level Apache projects designed to enable massive parallelization of data processing. Hadoop employs the MapReduce programming model and uses a distributed file s...

Published in: 2015 IEEE International Conference on Big Data (Big Data)

Date of Conference: 29 October 2015 - 01 November 2015

Date Added to IEEE Xplore: 28 December 2015

ISBN Information:

DOI: 10.1109/BigData.2015.7364095

Conference Location: Santa Clara, CA, USA

Contents

References is not available for this document.

Scaling NLP algorithms to meet high demand

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Scaling NLP algorithms to meet high demand

Alerts

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?