Heterogeneous Named Entity Similarity Function

Kocoń, Jan; Piasecki, Maciej

doi:10.1007/978-3-642-32790-2_27

Heterogeneous Named Entity Similarity Function

Jan Kocoń²¹ &
Maciej Piasecki²¹

Conference paper

1654 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7499))

Abstract

Many text processing tasks require to recognize and classify Named Entities. Currently available morphological analysers for Polish cannot handle unknown words (not included in analyser’s lexicon). Polish is a language with rich inflection, so comparing two words (even having the same lemma) is a non-trivial task. The aim of the similarity function is to match unknown word form with its word form in named-entity dictionary. In this article a complex similarity function is presented. It is based on a decision function implemented as a Logistic Regression classifier. The final similarity function is a combination of several simple metrics combined with the help of the classifier. The proposed function is very effective in word forms matching task.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Piasecki, M., Radziszewski, A.: Polish morphological guesser based on a statistical a tergo index. In: Proc. of IMCSIT — 2nd International Symposium Advances in Artificial Intelligence and Applications, AAIA 2007, pp. 247–256 (2007)
Google Scholar
Woliński, M.: Morfeusz —a Practical Tool for the Morphological Analysis of Polish. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds.) Morfeusz a Practical Tool for the Morphological Analysis of Polish. Advances in Soft Computing, vol. 5, pp. 511–520. Springer, Heidelberg (2006)
Google Scholar
Piskorski, J.: Named-Entity Recognition for Polish with SProUT. In: Bolc, L., Michalewicz, Z., Nishida, T. (eds.) IMTCI 2004. LNCS (LNAI), vol. 3490, pp. 122–133. Springer, Heidelberg (2005)
Chapter Google Scholar
Piskorski, J., Sydow, M., Wieloch, K.: Comparison of String Distance Metrics for Lemmatisation of Named Entities in Polish. In: Vetulani, Z., Uszkoreit, H. (eds.) LTC 2007. LNCS, vol. 5603, pp. 413–427. Springer, Heidelberg (2009)
Chapter Google Scholar
Piskorski, J., Sydow, M., Kupść, A.: Lemmatization of polish person names. In: Proc. of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies, ACL 2007, pp. 27–34. ACL, USA (2007)
Chapter Google Scholar
Christen, P.: A comparison of personal name matching: Techniques and practical issues. In: International Conference on Data Mining Workshops, pp. 290–294 (2006)
Google Scholar
Cohen, W.W., Ravikumar, P., Fienberg, S.E.: A Comparison of String Distance Metrics for Name-Matching Tasks. In: Proceedings of IJCAI 2003 Workshop on Information Integration, pp. 73–78 (2003)
Google Scholar
Cohen, W.W., Ravikumar, P., Fienberg, S.E.: A comparison of string metrics for matching names and records. In: Proceedings of the KDD 2003 Workshop on Data, Washington, DC, pp. 13–18 (2003)
Google Scholar
Lubenko, I., Ker, A.D.: Steganalysis using logistic regression. In: Proc. SPIE 7880, 78800K (2011)
Google Scholar
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Technical Report 8 (1966)
Google Scholar
Savary, A., Piskorski, J.: Lexicons and grammars for named entity annotation in the National corpus of Polish. In: Proceedings of the 18th International Conference Intelligent Information Systems, IIS 2010, Siedlce, Poland (2010)
Google Scholar
Džeroski, S., Erjavec, T.: Learning to Lemmatise Slovene Words. In: Cussens, J., Džeroski, S. (eds.) LLL 1999. LNCS (LNAI), vol. 1925, pp. 69–88. Springer, Heidelberg (2000)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Informatics, Wrocław University of Technology, Wybrzeże Wyspiańskiego 27, Wrocław, Poland
Jan Kocoń & Maciej Piasecki

Authors

Jan Kocoń
View author publications
You can also search for this author in PubMed Google Scholar
Maciej Piasecki
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Informatics, Department of Computer Graphics and Design, Masaryk University, Botanická 68a, 602 00, Brno, Czech Republic
Petr Sojka
Faculty of Informatics, Department of Information Technologies, Masaryk University, Botanická 68a, 602 00, Brno, Czech Republic
Aleš Horák , Ivan Kopeček & Karel Pala , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kocoń, J., Piasecki, M. (2012). Heterogeneous Named Entity Similarity Function. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2012. Lecture Notes in Computer Science(), vol 7499. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32790-2_27

Download citation

DOI: https://doi.org/10.1007/978-3-642-32790-2_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32789-6
Online ISBN: 978-3-642-32790-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics