Using Suffix Arrays for Efficiently Recognition of Named Entities in Large Scale

Adrian, Benjamin; Schwarz, Sven

doi:10.1007/978-3-642-23863-5_43

Using Suffix Arrays for Efficiently Recognition of Named Entities in Large Scale

Benjamin Adrian²⁵ &
Sven Schwarz²⁵

Conference paper

1291 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6882))

Abstract

In this paper, we present an efficient comparison of text and RDF data for recognizing named entities. Here, a named entity is a text sequence that refers to a URI reference within an RDF graph. We present suffix arrays as representation format for text and a relational database scheme to represent Semantic Web data. Using these representation facilities performs a named entity recognition in linear time complexity and without the requirement to hold names of existing entities in memory. Both is needed to implement a named entity recognition on the scale of for instance the DBpedia database.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Manola, F., Miller, E., McBride, B.: RDF Primer. Technical report, World Wide Web Consortium (February 2004)
Google Scholar
Adida, B., Herman, I., Sporny, M., Birbeck, M.: RDFa 1.1 Primer, rich structured data markup for web documents. Technical report, World Wide Web Consortium (March 2011)
Google Scholar
Wimalasuriya, D.C., Dou, D.: Ontology-based information extraction: An introduction and a survey of current approaches. Journal of Information Science 36(3), 306–323 (2010)
Article Google Scholar
Bontcheva, K., Tablan, V., Maynard, D., Cunningham, H.: Evolving gate to meet new challenges in language engineering. Nat. Lang. Eng. 10(3-4), 349–373 (2004)
Article Google Scholar
Tori, A.: Zemanta service. Zemanta (2008)
Google Scholar
Nigam, K., Lafferty, J., Mccallum, A.: Using Maximum Entropy for Text Classification. In: IJCAI 1999 Workshop on Machine Learning for Information Filtering, pp. 61–67 (1999)
Google Scholar
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, ICML 2001, pp. 282–289. Morgan Kaufmann Publishers Inc, San Francisco (2001)
Google Scholar
McCallum, A.K.: Mallet: A machine learning for language toolkit (2002)
Google Scholar
Zhang, T., Damerau, F., Johnson, D.: Text chunking using regularized winnow. In: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, ACL 2001, pp. 539–546. Association for Computational Linguistics, Stroudsburg (2001)
Google Scholar
Kärkkäinen, J., Sanders, P., Burkhardt, S.: Linear work suffix array construction. J. ACM 53, 918–936 (2006)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Knowledge Management Department, DFKI GmbH, Kaiserslautern, Germany
Benjamin Adrian & Sven Schwarz

Authors

Benjamin Adrian
View author publications
You can also search for this author in PubMed Google Scholar
Sven Schwarz
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Integrated Sensor Systems, University of Kaiserslautern, Erwin-Schroedinger-str. 12, 67663, Kaiserslautern, Germany
Andreas König
Knowledge-Based Systems Group, Department of Computer Science, University of Kaiserslautern, P.O. Box 3049, 67653, Kaiserslautern, Germany
Andreas Dengel
School of Business, University of Applied Sciences Northwestern Switzerland, Riggenbachstr. 16, 4600, Olten, Switzerland
Knut Hinkelmann
Graduate School of Engineering, Osaka Prefecture University, 1-1 Gakuen-cho, 599-8531, Sakai,, Osaka, Japan
Koichi Kise
KES International, P.O. Box 2115, BN43 9AF, Shoreham-by-sea, UK
Robert J. Howlett
University of South Australia, Adelaide, 5095, Mawson Lakes, SA, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Adrian, B., Schwarz, S. (2011). Using Suffix Arrays for Efficiently Recognition of Named Entities in Large Scale. In: König, A., Dengel, A., Hinkelmann, K., Kise, K., Howlett, R.J., Jain, L.C. (eds) Knowlege-Based and Intelligent Information and Engineering Systems. KES 2011. Lecture Notes in Computer Science(), vol 6882. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23863-5_43

Download citation

DOI: https://doi.org/10.1007/978-3-642-23863-5_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23862-8
Online ISBN: 978-3-642-23863-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics