Semantic Entity Identification in Large Scale Data via Statistical Features and DT-SVM

Wang, Dingxian; Liu, Xiao; Luo, Hangzai; Fan, Jianping

doi:10.1007/978-3-642-41230-1_30

Dingxian Wang²⁰,
Xiao Liu²⁰,
Hangzai Luo²¹ &
…
Jianping Fan²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8180))

Included in the following conference series:

International Conference on Web Information Systems Engineering

2004 Accesses

Abstract

Semantic entities carry the most important semantics of text data. However, traditional approaches such as named entity recognition and new word identification may only detect some specific types of entities. In addition, they generally adopt sequence annotation algorithms such as Hidden Markov Model (HMM) and Conditional Random Field (CRF) which can only utilize limited context information. As a result, they are inefficient on the extraction of semantic entities that were never shown in the training data. In this paper we propose a strategy to extract unknown text semantic entities by integrating statistical features, Decision Tree (DT), and Support Vector Machine (SVM) algorithms. With the proposed statistical features and novel classification approach, our strategy can detect more semantic entities than traditional approaches such as CRF and Bootstrapping-SVM methods. It is very sensitive to new entities that just appear in fresh data. Our experimental results have shown that the precision, recall rate and F-One rate of our strategy are about 23.6%, 21.5% and 25.8% higher than that of the representative approaches on average.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Hybrid Generative/Discriminative Model for Rapid Prototyping of Domain-Specific Named Entity Recognition

Nested Entity Recognition Method Based on Multidimensional Features and Fuzzy Localization

Article Open access 04 June 2024

Chinese Named Entity Recognition: Applications and Challenges

References

Altun, Y., Tsochantaridis, I., Hofmann, T., et al.: Hidden Markov Support Vector Machines. In: Machine Learning-International Workshop Then Conference, vol. 20 (2003)
Google Scholar
Arndt, R., Troncy, R., Staab, S., Hardman, L., Vacura, M.: COMM: Designing a Well-Founded Multimedia Ontology for The Web. The Semantic Web, 30–43 (2007)
Google Scholar
Bai, S., Wu, H.J.P., Li, H., Loudon, G.: System for Chinese Tokenization and Named Entity Recognition, Google Patents. US Patent 6,311,152 (2001)
Google Scholar
Berger, A.L., Pietra, V.J.D., Pietra, S.A.D.: A Maximum Entropy Approach to Natural Language Processing. Computational Linguistics 22(1), 39–71 (1996)
Google Scholar
Chen, A., Peng, F., Shan, R., Sun, G.: Chinese Named Entity Recognition with Conditional Probabilistic Models. In: 5th SIGHAN Workshop on Chinese Language Processing, Australia (2006)
Google Scholar
Cortes, C., Vapnik, V.: Support-Vector Networks. Machine learning 20(3), 273–297 (1995)
MATH Google Scholar
Finkel, J.R., Grenager, T., Manning, C.: Incorporating Non-Local Information into Information Extraction Systems by Gibbs Sampling. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, pp. 363–370 (2005)
Google Scholar
Fu, G., Luke, K.K.: Chinese Unknown Word Identification using Class-Based LM. Natural Language 2004, 704–713 (2005)
Google Scholar
Gao, J., Li, M., Wu, A., Huang, C.N.: Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach. Computational Linguistics 31(4), 531–574 (2005)
Article MATH Google Scholar
Hunter, J.: Adding multimedia to the semantic web: Building an mpeg-7 ontology. In: International Semantic Web Working Symposium, SWWS (2011)
Google Scholar
Jones, K.S.: A Statistical Interpretation of Term Specificity and Its Application in Retrieval. Journal of Documentation 28(1), 11–21 (1972)
Article Google Scholar
Kudo, T.: CRF++: Yet Another CRF Toolkit, http://crfpp.sourceforge.net (accessed on March 1, 2012)
Latham, P., Roudi, Y.: Mutual information. Scholarpedia 4(1), 16–58 (2009)
Article Google Scholar
Li, H., Huang, C.N., Gao, J., Fan, X.: The use of SVM for Chinese New Word Identification. Natural Language 2004, 723–732 (2005)
Google Scholar
Sekine, S., Grishman, R., Shinnou, H.: A Decision Tree Method for Finding and Classifying Names in Japanese Texts. In: Proceedings of the 6th Workshop on Very Large Corpora (1998)
Google Scholar
Sproat, R., Emerson, T.: The First International Chinese Word Segmentation Bakeoff. In: Proceedings of the 2nd SIGHAN Workshop on Chinese Language Processing, vol. 17, pp. 133–143 (2003)
Google Scholar
Takeuchi, K., Collier, N.: Use of Support Vector Machines in Extended Named Entity Recognition. In: Proceedings of the 6th Conference on Natural Language Learning, vol. 20, pp. 1–7 (2002)
Google Scholar
Tsai, T.H., Wu, S.H., Lee, C.W., Shih, C.W., Hsu, W.L.: Mencius: A Chinese Named Entity Recognizer Using the Maximum Entropy-Based Hybrid Model. International Journal of Computational Linguistics and Chinese Language Processing 9(1) (2004)
Google Scholar
Wu, A., Jiang, Z.: Statistically-Enhanced New Word Identification in a Rule-Based Chinese System. Proceedings of the 2nd Workshop on Chinese Language Processing: Held in Conjunction with the 38th Annual Meeting of the Association for Computational Linguistics 12, 46–51 (2000)
Article Google Scholar
Wu, Y., Zhao, J., Xu, B.: Chinese Named Entity Recognition Combining a Statistical Model with Human Knowledge. In: ACL 2003, vol. 15, pp. 65–72 (2003)
Google Scholar
Wu, Y., Zhao, J., Xu, B., Yu, H.: Chinese Named Entity Recognition based on Multiple Features. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 427–434 (2005)
Google Scholar
Zhao, Y., Cui, L., Yang, H.: Evaluating Reliability of Co-citation Clustering Analysis in Representing the Research History of Subject, 80(1), 91–102 (2009)
Google Scholar
Zheng, Y., Liu, Z., Sun, M., Ru, L., Zhang, Y.: Incorporating User Behaviors in New Word Detection. In: Proceedings of the 21st International Joint Conference on Artificial Intelligence, pp. 2101–2106 (2009)
Google Scholar
Niu, C., Li, W., Ding, J., et al.: A Bootstrapping Approach to Named Entity Classification using Successive Learners. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, vol. 1, pp. 335–342. Association for Computational Linguistics (2003)
Google Scholar
Tellier, I., Eshkol, I., Taalab, S., et al.: Pos-tagging for Oral Texts with Crf and Category Decomposition. Natural Language Processing and its Applications 46, 79–90 (2010)
Google Scholar
Goutte, C., Gaussier, É.: A Probabilistic Interpretation of Precision, Recall and F-score, with Implication for Evaluation. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 345–359. Springer, Heidelberg (2005)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

East China Normal University, Shanghai, China
Dingxian Wang & Xiao Liu
Northwest University of China, Xi’an, China
Hangzai Luo & Jianping Fan

Authors

Dingxian Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Hangzai Luo
View author publications
You can also search for this author in PubMed Google Scholar
Jianping Fan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

The University of New South Wales, Sydney, NSW, Australia
Xuemin Lin
Aristotle University of Thessaloniki, Thessaloniki, Greece
Yannis Manolopoulos
AT&T Labs-Research, Florham Park, NJ, USA
Divesh Srivastava
Victoria University, Melbourne, Australia
Guangyan Huang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, D., Liu, X., Luo, H., Fan, J. (2013). Semantic Entity Identification in Large Scale Data via Statistical Features and DT-SVM. In: Lin, X., Manolopoulos, Y., Srivastava, D., Huang, G. (eds) Web Information Systems Engineering – WISE 2013. WISE 2013. Lecture Notes in Computer Science, vol 8180. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41230-1_30

Download citation

DOI: https://doi.org/10.1007/978-3-642-41230-1_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41229-5
Online ISBN: 978-3-642-41230-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Semantic Entity Identification in Large Scale Data via Statistical Features and DT-SVM

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A Hybrid Generative/Discriminative Model for Rapid Prototyping of Domain-Specific Named Entity Recognition

Nested Entity Recognition Method Based on Multidimensional Features and Fuzzy Localization

Chinese Named Entity Recognition: Applications and Challenges

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Semantic Entity Identification in Large Scale Data via Statistical Features and DT-SVM

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A Hybrid Generative/Discriminative Model for Rapid Prototyping of Domain-Specific Named Entity Recognition

Nested Entity Recognition Method Based on Multidimensional Features and Fuzzy Localization

Chinese Named Entity Recognition: Applications and Challenges

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation