Loading [a11y]/accessibility-menu.js
Scaling NLP algorithms to meet high demand | IEEE Conference Publication | IEEE Xplore

Scaling NLP algorithms to meet high demand


Abstract:

The growth of digital information and the richness of data shared online make it increasingly valuable to be able to process large amounts of data at a very high throughp...Show More

Abstract:

The growth of digital information and the richness of data shared online make it increasingly valuable to be able to process large amounts of data at a very high throughput rate. At the same time, rising interest in natural language processing (NLP) has resulted in the development of a great number of algorithms designed to perform a variety of NLP tasks. There is a need for frameworks that enable multiple users and applications to run individual or a combination of NLP algorithms to derive relevant information from data [1]. In this work, we take multiple NLP algorithms that adhere to the ADEPT framework and deploy them on distributed processing architectures to satisfy the dual needs of serving a large user group and meeting high throughput standards, while reducing the time from lab to production environment. The ADEPT framework provides a set of uniform APIs for interacting with a diverse set of NLP algorithms by defining a set of data structures for representing NLP concepts [2]. It offers multiple access points for interacting with these algorithms; a REST API, a serialized Data API, and processor components that can be used in a larger pipeline. The comprehensive ADEPT architecture can support algorithms that perform sentence-level, document-level, or corpus-level text processing, allowing a wide range of NLP algorithms to make use of the framework. ADEPT interfaces allow parallelization to occur at an optimum level for each algorithm. Amazon Web Services (AWS) consists of a stack of technologies commonly used in the commercial sphere to host web applications designed to scale rapidly with a growing user base. The Amazon Elastic Compute Cloud (EC2) and its auto-scaling feature in particular provide a means of reliably and efficiently scaling a service to meet traffic demands. Hadoop and Spark are top level Apache projects designed to enable massive parallelization of data processing. Hadoop employs the MapReduce programming model and uses a distributed file s...
Date of Conference: 29 October 2015 - 01 November 2015
Date Added to IEEE Xplore: 28 December 2015
ISBN Information:
Conference Location: Santa Clara, CA, USA

Contact IEEE to Subscribe

References

References is not available for this document.