Abstract
In this paper, we present an efficient comparison of text and RDF data for recognizing named entities. Here, a named entity is a text sequence that refers to a URI reference within an RDF graph. We present suffix arrays as representation format for text and a relational database scheme to represent Semantic Web data. Using these representation facilities performs a named entity recognition in linear time complexity and without the requirement to hold names of existing entities in memory. Both is needed to implement a named entity recognition on the scale of for instance the DBpedia database.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Manola, F., Miller, E., McBride, B.: RDF Primer. Technical report, World Wide Web Consortium (February 2004)
Adida, B., Herman, I., Sporny, M., Birbeck, M.: RDFa 1.1 Primer, rich structured data markup for web documents. Technical report, World Wide Web Consortium (March 2011)
Wimalasuriya, D.C., Dou, D.: Ontology-based information extraction: An introduction and a survey of current approaches. Journal of Information Science 36(3), 306–323 (2010)
Bontcheva, K., Tablan, V., Maynard, D., Cunningham, H.: Evolving gate to meet new challenges in language engineering. Nat. Lang. Eng. 10(3-4), 349–373 (2004)
Tori, A.: Zemanta service. Zemanta (2008)
Nigam, K., Lafferty, J., Mccallum, A.: Using Maximum Entropy for Text Classification. In: IJCAI 1999 Workshop on Machine Learning for Information Filtering, pp. 61–67 (1999)
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, ICML 2001, pp. 282–289. Morgan Kaufmann Publishers Inc, San Francisco (2001)
McCallum, A.K.: Mallet: A machine learning for language toolkit (2002)
Zhang, T., Damerau, F., Johnson, D.: Text chunking using regularized winnow. In: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, ACL 2001, pp. 539–546. Association for Computational Linguistics, Stroudsburg (2001)
Kärkkäinen, J., Sanders, P., Burkhardt, S.: Linear work suffix array construction. J. ACM 53, 918–936 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Adrian, B., Schwarz, S. (2011). Using Suffix Arrays for Efficiently Recognition of Named Entities in Large Scale. In: König, A., Dengel, A., Hinkelmann, K., Kise, K., Howlett, R.J., Jain, L.C. (eds) Knowlege-Based and Intelligent Information and Engineering Systems. KES 2011. Lecture Notes in Computer Science(), vol 6882. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23863-5_43
Download citation
DOI: https://doi.org/10.1007/978-3-642-23863-5_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23862-8
Online ISBN: 978-3-642-23863-5
eBook Packages: Computer ScienceComputer Science (R0)