Product named entity recognition in Chinese text

Zhao, Jun; Liu, Feifan

doi:10.1007/s10579-008-9066-8

Product named entity recognition in Chinese text

Published: 17 April 2008

Volume 42, pages 197–217, (2008)
Cite this article

Language Resources and Evaluation Aims and scope Submit manuscript

Jun Zhao¹ &
Feifan Liu¹

458 Accesses
11 Citations
Explore all metrics

Abstract

There are many expressive and structural differences between product names and general named entities such as person names, location names and organization names. To date, there has been little research on product named entity recognition (NER), which is crucial and valuable for information extraction in the field of market intelligence. This paper focuses on product NER (PRO NER) in Chinese text. First, we describe our efforts on data annotation, including well-defined specifications, data analysis and development of a corpus with annotated product named entities. Second, a hierarchical hidden Markov model-based approach to PRO NER is proposed and evaluated. Extensive experiments show that the proposed method outperforms the cascaded maximum entropy model and obtains promising results on the data sets of two different electronic product domains (digital and cell phone).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Named Entity Recognition for Brazilian Portuguese Product Titles

How to Improve E-commerce Search Engines? Evaluating Transformer-Based Named Entity Recognition on German Product Datasets

Chinese Named Entity Recognition: Applications and Challenges

Notes

For purposes of clarity and precision, the singular forms “product named entity” and “named entity” are abbreviated “PRO NE” and “NE”, respectively, while the plural forms “product named entities” and “named entities” are abbreviated “PRO NEs” and “NEs,” respectively.

References

Aberdeen, J., et al. (1995). MITRE: Description of the ALEMBIC system used for MUC-6. In Proceedings of the Sixth Message Understanding Conference (MUC-6) (pp. 141–155).
Bick, E. (2004). A named entity recognizer for Danish. In Lino et al. (Eds.), Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC2004), Lisbon (pp. 305–308).
Bikel, D. M., Miller, S., Schwartz, R., & Weischedel, R. (1997). Nymble: A high-performance learning name-finder. In Proceedings of the Fifth Conference on Applied Natural Language Processing (pp. 194–201), ACL.
Borthwick, A. (1999). A maximum entropy approach to named entity recognition. PhD Dissertation. Computer Science Department, New York University.
Carletta, J. (1996). Assessing agreement on classification tasks: The Kappa statistic. Computational Linguistics, 22(2), 249–254.
Google Scholar
Collier, N., Nobata, C., & Tsujii, J. (2000). Extracting the names of genes and gene products with a hidden Markov model. In Proceedings of the 18th International Conference on Computational Linguistics (COLING’2000), Saarbrucken, Germany (pp. 201–207).
Fine, S., Singer, Y., & Tishby, N. (1998). The hierarchical hidden Markov model: Analysis and applications. Machine Learning, 32(1), 41–62.
Article Google Scholar
Jelinek, F., & Mercer, E. L. (1980). Interpolated estimation of Markov source parameters from sparse data. In D. Gelsema & L. Kanal (Eds.), Pattern recognition in practice. North-Holland.
McCallum, A., Freitag, D., & Pereira, F. (2000). Maximum entropy Markov models for information extraction and segmentation. In Proceedings of the Seventeenth International Conference on Machine Learning (ICML-2000), Stanford, CA (pp. 591–598).
Niu, C., Li, W., Ding, J., & Srihari, R. K. (2003). A bootstrapping approach to named entity classification using successive learners. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL 2003) (Sapporo, pp. 335–342).
Pierre, J. M. (2002). Mining knowledge from text collections using automatically generated metadata. In Proceedings of Fourth International Conference on Practical Aspects of Knowledge Management (PAKM2002), Vienna (pp. 537–548).
Sekine, S., Grishman, R., & Shinou, H. (1998). A decision tree method for finding and classifying names in Japanese texts. In Proceedings of the Sixth Workshop on Very Large Corpora, Canada, http://www.cs.nyu.edu/~sekine/papers/wvlc98.pdf.
Sigel, S., & Castellan, N. J. (1988). Non-parametric statistics for the behavioral sciences (2nd ed.). McGraw-Hill.
Wu, Y., Zhao, J., & Xu, B. (2003). Chinese named entity recognition combining statistical model with human knowledge. In The Workshop attached with 41st ACL for Multilingual and Mix-language Named Entity Recognition: Combining Statistical and Symbolic Models, Sappora (pp. 65–72).
Xiong, D., Yu, H., & Liu, Q. (2004). Tagging complex NEs with Maxent models: Layered structures versus extended Tagset. In Proceedings of the First International Joint Conference on Natural Language Processing (IJCNLP-04), Sanya (pp. 638–643).
Yi, E., Lee, G. G., & Park, S.-J. (2004). SVM-based biological named entity recognition using minimum edit-distance feature boosted by virtual examples. In Proceedings of the First International Joint Conference on Natural Language Processing (IJCNLP-04), Sanya (pp. 22–24).
Yu, S., Duan, H., Zhu, X., Swen, B., & Chang, B. (2003). Word segmentation, POS tagging and phonetic notation. International Journal of The Chinese and Oriental Languages Information Processing Society, 13(2), 121–159.
Google Scholar

Download references

Acknowledgments

This work is supported by the National High Technology Development 863 Program of China under Grant No. 2006AA01Z144, the National Natural Science Foundation of China under Grant No. 60673042, and the Natural Science Foundation of Beijing under Grants No. 4052027 and 4073043. This research is also carried out as part of a cooperative project with Fujitsu R&D Center Co., Ltd. We would like to thank Dr. Hao YU, Dr. Yingju XIA, and Dr. Fumihito Nishino for helpful conversations and feedback on the corpus. We would like to thank Dr. Yang LIU of the University of Texas at Dallas, Dr. Ying ZHAO of Tsinghua University, and Mr. Matthew Trueman for their useful suggestions for modifying earlier drafts of the paper. We are grateful to the anonymous reviewers for very helpful comments on an earlier draft. Their insights and suggestions have led to many improvements in the paper.

Author information

Authors and Affiliations

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, 100080, China
Jun Zhao & Feifan Liu

Authors

Jun Zhao
View author publications
You can also search for this author inPubMed Google Scholar
Feifan Liu
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Jun Zhao.

Additional information

This research was conducted under the framework of the Chinese Linguistic Data Consortium (ChineseLDC). In the first phase, ChineseLDC created a series of fundamental Chinese language resources, including Comprehensive Chinese Lexicon, Chinese Grammatical Knowledge Base (frequent words), Word-segmented and POS-tagged Chinese Corpus, Syntactic Treebank, Chinese–English Parallel Corpus, Chinese Semantic Lexicon, etc. Construction of the Product Named Entity Tagged Corpus and development of the Automatic Product Named Entity Recognition Tool are among the tasks of the second phase of ChineseLDC.

Appendix: Peking University’s TagSet for POS Tagging Chinese Texts (Yu et al. 2003)

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, J., Liu, F. Product named entity recognition in Chinese text. Lang Resources & Evaluation 42, 197–217 (2008). https://doi.org/10.1007/s10579-008-9066-8

Download citation

Received: 23 August 2006
Accepted: 19 March 2008
Published: 17 April 2008
Issue Date: May 2008
DOI: https://doi.org/10.1007/s10579-008-9066-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Product named entity recognition in Chinese text

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Named Entity Recognition for Brazilian Portuguese Product Titles

How to Improve E-commerce Search Engines? Evaluating Transformer-Based Named Entity Recognition on German Product Datasets

Chinese Named Entity Recognition: Applications and Challenges

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Peking University’s TagSet for POS Tagging Chinese Texts (Yu et al. 2003)

Appendix: Peking University’s TagSet for POS Tagging Chinese Texts (Yu et al. 2003)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now